If your website is a social network then this post is probably not for you. If you have a blog, a news site or an e-commerce site, it might!
This shot from NewRelic compares the load on the server on the cyan line with the pageviews on the yellow line before and after a push notification is sent.
We acheived these great results by a fanatical use of CDN and integrating it really deep into our servers. We constantly recited that disconnecting the correlation between pageviews and CPU will allow us to scale without really scaling hardware.
A year ago we started a big change in FTBpro’s website. We have changed the design totally, moved to a single page architecture and started exploring new ways to minimise load on our servers. Later we have implemented the lessons we learned and the methodologies we have developed on our mobile API. On the outside, the result is FTBpro.com site and mobile app as you know it today. We had two goals in mind - make the user experience faster and lower the load on the servers. This post is about the latter.
Disconnect the correlation between Pageview and CPU
We need to make sure CPU power is not wasted - twice, thrice and more for the same data. Why SQL twice when you know nothing changed…
If a page is called twice in a reasonable timeframe don’t rebuild it. This can be achieved via full page caching in one of three levels:
The first and second approaches are slightly easier to manage as they are contained within your own server but they don’t eliminate the correlation between Pageview and CPU - they do take it to minimum. If the first page took 200ms to render the cached version could be returned at 1ms or even less. With CDN the cached version is not even hitting the main server so the correlation can be defined as “disconnected”. The downside of using a CDN is usually the hassle of choosing one, setting it up, and get a good contract - there are tons of CDNs out there and many small parameters to distinguish between them.
The naive approach for full page caching is setting an expiration time on a page (e.g. 15 minute) so once every X minutes it is expired and the CDN takes the freshest version from the server. That’s okay and very easy to manage but it has two disadvantages:
- Data updates don’t appear straight away to the user.
- There will still be correlation between pageviews and CPU, even if low one.
In order to overcome the second disadvantage we can just set the expiration time to never ;-) but now our page will surely be obsolete at some time - as any editorial update won’t be reflected. This can be solved by using a CDN provider that has purge or load APIs. In the server layer attach an expiration event to the classes that are in charge of updating data to these pages (e.g. Post#after_save in a Rails app). Most CDN providers has these APIs but there are two important criteria that differ from one to another:
- Speed: some CDNs purge at 200ms, some at 1min, some at 45min
- Purge criteria: some CDNs allow purge by exact URL, some allow REGEXPs, some force you to “tag” each URL in the HTTP response headers and purge by those tags (much more work but can give the best results in a few cases)
So what did we do?
- Configured the CDN to keep all our pages for ever, never expire.
- Modified URL structure of the APIs (mobile & web APIs) to be in a pattern that is purgeable. For example - our CDN couldn’t purge based on query string parameters; We had to modify URLs to have a restful structure and /feed?team=arsenal had to change to /feed/arsenal.
- Added “expirators” to our different models. Whenever a post is saved we expire it’s URL & the URLs of feeds that should contain it. e.g. Updating a post about a game between Arsenal and Barca will expire the url of the post itself, Arsenal’s feed, Barca’s feed, Premier League feed and La-Liga feed (both on mobile and web)
- Before sending push-notification on a post, it is automatically preloaded to the CDN. At these times we can get up to 100k requests a minute to the website and none of them is reaching our servers.
- Added an Application level full-page caching layer with MemCached after realising CDN is constructed of many different independent servers which will all hit our application server if they don’t have the cached version, creating a real load on them.
- The user experience became much better because all requests are served from light&fast CDN servers that are geographically near the user.
- We use the exact same server resources for 100m pageviews we have today as we used for 5m pageviews we had 7 months ago.
- We chose a CDN that fit our needs. We pay them a small fraction of what we payed our former CDN with a x20 increase in load.
That’s a good opportunity to praise Edgecast, the CDN which we use. They have exceeded our expectations in every parameter:
- Amazing quality of service. They respond fast to emails, they are available on the phone and just stay there and give service for as long as it takes.
- Technology. Their user interface is a bit sluggish but it allows us to really go crazy and set different configuration rules based on our wild url structure. And they purge fast - a few seconds to 1-2 minutes per purge.
- Great price. That wasn’t the main criteria in choosing CDN but it happened to be very affordable nonetheless.
by Dor Kalev, CTO @ FTBpro