Improving Mosaic Performance

[updated 16/12/19]

Over the last few months, we’ve had a number of calls to Mosaic support asking about how we manage the speed of page loading and content editing. We thought it would be helpful to share with you our current and planned work in this area, so that you’re in the picture about what performance you can expect for your web site.

Page caching

The Mosaic infrastructure contains a component called a cache to speed-up the delivery of web pages to website viewers. The first time a page is requested after being published, the request goes through to the Mosaic back-end (the flowers in the image below). At that time a copy of the page is made and stored in the cache (the nectar jars). Subsequent requests (the bees) are sent the copy of the page, if it hasn’t changed in the meantime. This is faster because it is already assembled and simply needs to be returned. You can read more to understand how your site’s webpages are cached and how cached copies are updated when you edit a page.

Page caching helps to speed up the delivery of web pages to website viewers. However, it cannot improve the speed of content editing, as this always needs to be done on the Mosaic back-end. It also does not help for pages restricted behind SSO-login, as the login check needs to be handled by the back-end, to ensure the user is authorised to see it, before the page can be served.

The image shows 2 bees retrieving nectar from some jars, a metaphor for a web cache. The jars sit in front of a bunch of flowers, a metaphor for the webserver.

 

Capacity

The Mosaic team regularly reviews with our Cloud hosting partners, Acquia, the capacity of our system architecture, to make sure that it is large enough to keep pace with the volume of requests as the number of sites on the platform grows.

Capacity refers to the amount of processing power available to the platform (CPU), and the amount of memory available (see below). It also relates to the number of servers used and the way they are arranged together. This ensures that resources are available in each layer to respond effectively to the various tasks that the servers need to carry out.

Regular hardware updates – 5 in fact – have been carried out on the Mosaic platform since the service launched in June 2017. In July 2019 a major upsize of the whole platform was carried out to convert the single-tier server architecture to a multi-tier architecture. This change upscaled the server allocations, separating out the database and filestore to their own layer and doubling the number of web servers.

The image shows 3 bees seeking nectar from a bunch of flowers, as a metaphor for concurrent serving of web pages

Memory

When a process on a server does not have sufficient memory available to it, the task will time-out before it has completed. To ensure this does not happen, we make sure there is a sufficient amount of memory available for all the tasks the Mosaic servers need to complete. The amount of memory allocated to different types of tasks can also be managed, to ensure that the overall amount is used efficiently, and memory remains for other tasks to use. We are currently reviewing with Acquia the finer details of how this is setup, in order to eliminate the small number of cases where tasks time-out. (In these rare cases you may see an Acquia ‘Temporarily Unavailable’ error.)

The image shows a bee stopping in mid-flight with text 'What did I come here for, again?', to illustrate how a server process can time-out when there is inadequate memory.

Efficiency

When a page is requested, Mosaic assemblies the page components on the fly, according to the code for the features the page has been built with. It then delivers the page to the browser. Over time, as features are refined and amended, the code can get complex. Where the code is not as streamlined as it could be, it may take longer to run than it needs to. We are currently working with Acquia to identify instances where processes take a long time to complete, and analysing whether the code used for these can be tightened up to make them more direct.

The image shows a bee looping in circles as it flies towards a bunch of flowers, as a metaphor for how inefficient code can slow down the processing of a page request.

Attack protection

Sadly, it is a commonplace nowadays that malicious attacks occur frequently across the internet. All website administrators are familiar with a common type of attack known as a Denial of Service (or DOS) attack. In these cases, a malicious web user – who can be located anywhere around the world – sends a large number of requests at the same time in an effort to overload the webserver (the swarm in the picture below). This does not usually cause a problem for web users viewing Mosaic pages, since the capacity of the cache is large enough to handle a great number of requests at once. But for content editing, the result can be that Mosaic runs very slowly or becomes unavailable.

We monitor for these attacks and are typically able to block them, once we have analysed where they come from. However, this does take time – typically around 30-60 minutes and sometimes longer – which is irritating to put up with while the attacks are happening.

Over the last few months, in common with other sites in the University’s web domain, the number of these attacks taking place on Mosaic has increased greatly. We are therefore currently investigating tools to prevent these from occurring, in preference to responding to them once they have already begun.

The image shows a swarm of bees nearly completely covering a bunch of flowers as a metaphor for a Denial of Service attack

So, what we are doing to improve things?

The Mosaic team, in partnership with our Cloud hosts, Acquia, regularly monitors a range of metrics to track the performance of the platform. Throughout 2019, in particular, we have been keeping a close eye on our day-to-day platform management so that we:

  • plan for capacity growth
  • optimise our management of server resources
  • improve the efficiency of our code
  • deal with DOS attacks

As a result of which, we have made 20 changes in Mosaic to address performance, in 2019 alone.

However, we recognise that this day-to-day management has not provided the level of improvement required. We have therefore developed a Roadmap for Performance Enhancement, which we are in the process of executing:

 

Improvement

Date

Status

1

Upscale the Platform from Single- to Multi-Tier

July 2019

Complete

2

Build a dedicated site to benchmark Platform speed and a set of automated tests to be run at regular intervals to provide metrics over time for typical viewing and editing scenarios

September – October 2019

Complete

3

Commission Acquia Professional Services to carry out a Performance Audit to identify areas of code and configuration to improve

October 2019

Complete

4 == Implement recommended Platform upsize to increase capacity December 2019 Complete
5

== Implement code and configuration remediations, as indicated by Audit findings

Started December 2019 Due to complete Easter 2020

6

Source and implement a tool to provide DOS attack protection

October 2019 - tbc

In Progress

7

Work with content owners to reduce the extent of SSO-restricted content to a minimum, moving other content to SharePoint

Ongoing

In Progress

By addressing these root causes of slow running we expect to be able to provide a much more consistent level of good performance on the Mosaic Platform. We look forward to updating you on progress as this work proceeds.