Jaap Haagmans The all-round IT guy

5Jan/140

Optimizing Magento to become a speed monster

The post title might be a bit bold, but, contrary to beliefs, it's in no way impossible to have a blazingly fast Magento store. Most of the gains aren't in quick fixes though. Some of the changes will require quite a bit of time.

MySQL query caching

MySQL isn't optimized for Magento out-of-the-box. There's one general mechanism that can make a world of difference for Magento, which is called the query_cache. Try to play with the following settings and measure the results (e.g. using New Relic):

query_cache_type = 1
query_cache_size = 256M

This will enable MySQL to store 256M of query results so that the most common queries (like loading your frontpage products). You can evaluate the actual query cache usage by running the command

SHOW STATUS LIKE '%Qcache%';

A production database should show some free memory for the query cache. Increase the size if it runs out.

PHP Opcode caching

Opcode caching is a mechanism that enables you to store the compiled opcode of your PHP files in shared memory. This reduces access time and eliminates PHP parsing time, meaning PHP files can be executed faster. For Magento, this could easily reduce the time needed for certain actions by seconds. My favourite opcode caching mechanism is APC, simply because it's maintained by the folks at PHP.

Magento Compilation

Even though recent development reduced the need for the Magento compiler, there still is a (small) performance gain, if you don't mind having to compile your installation after you make changes to its files. The Magento compiler does some PHP file concatenation and puts them in a single folder. This reduces the time Magento has to spend in the filesystem.

Store Magento cache in memory

By default, Magento will store its cache files under var/cache. Now, let me first point out that caching should always be enabled for performance reasons (simply because it reduces disk I/O and database calls). However, storing these cache files on the file system will still induce an I/O overhead. If your server is backed by SSD disks, this overhead is pretty small, but if not, you can gain a lot by storing this cache in shared memory. Magento supports memcached out-of-the-box (although your server of course needs memcached installed), but I recently switched to Redis using an extension called Cm_Cache_Backend_Redis. We run it on a separate server because we have multiple webservers, but you might not need to.

Use SSD disks

Magento is quite I/O heavy. Where an IDE drive will only go as fast as 150-200 IOPS and degrades rapidly over time, an SSD disk can easily do 50 times as many IOPS or even more. If I/O is your bottleneck, using SSD is the way to go.

Use front end caching

I know that there are Full Page Caching modules available for Magento, but I recommend dropping them in favour of a front end caching mechanism. Varnish is probably the way to go at this time. The main reason to go for front end caching is that it's highly configurable.

Varnish stores a copy of the webserver's response in shared memory. If another visitor visits the same page, Varnish will serve the page from memory, taking the load off the webserver. Because some pages are dynamic or have dynamic parts, it's important to configure Varnish so that it only caches content that is (more or less) static. Varnish also supports ESI, which enables you to pass through blocks of content from the webserver. If you use a Magento extension to enable Varnish caching, it will do most of the configuration for you (although extensive testing is required).

There are two extensions I have tried, the first one being PageCache. I prefer Turpentine though, because it's more powerful and supports ESI.

23Jul/130

Setting up your own dynamic CDN with edge locations using Varnish and SSL

As I mentioned earlier in my post about the new SSL functionality for Amazon Cloudfront, there's a possibility to set up your own CDN with "edge" locations. I prefer calling them edgy though, because we're using Amazon regions and not the real Amazon Edge locations that are available. But it will provide us with some flexibility. You will only serve from regions you think you need (thus saving costs) and you can always add your own edge location hosted at a datacenter outside of AWS (for instance, somewhere in Amsterdam).

Please mind that I haven't built the POC environment for this yet. I am fairly confident that the below will work, but please comment if you don't agree.

Basically, what we want is to send visitors to the content location nearest to them. On these locations, we will cache static content and try to cache dynamic content as much as possible, while being able to serve content through SSL. Take a look at this sketch for a visual image of what we'll try to do:

DNS-edge-origin

The DNS

For the DNS, we will of course use Amazons Route53. Not only does Route53 serve clients from the nearest possible location (read: location with lowest latency), it can also do health checks and route to the endpoint with the lowest possible latency. Read more about latency based routing in the AWS docs. Set it up to include your edge locations and monitor the health of these locations.

The Edge locations

This is where it gets interesting. There are a few possible ways to go, you can setup a simple Apache/nginx server to host your content, but you will have to worry about keeping copies of your content on every server. It's possible, but it might not be as easy to use. Besides, it will not provide a way to easily serve dynamic content.

I've chosen a Varnish based caching solution for this specific use case, because it's very flexible and provides a lot of tweaking options. Besides, Varnish will perform well on a relatively "light" server. Varnish will not be able to handle SSL termination though, so we will use nginx as a proxy to offload SSL. You can read how to Offload SSL using nginx in their wiki.

Setting up your specific Varnish environment is outside the scope of this article, because there are too many use cases to put into one single article. I will provide a few things to consider though.

Let nginx only handle SSL traffix

Varnish is perfectly able to handle unencrypted traffic. So nginx should only listen to port 443. Varnish can listen to port 80.

Use auto-scaling

For some, this is a no-brainer. I think it's best practise to always use auto-scaling. You will probably not want to scale out your edge location, but you do want to automatically terminate unhealthy EC2 instances and fire up a new one. Something to consider here is that you will have to work around the fact that normally, you will not be able to keep the same IP address for your replacement instance. A possible workaround is using ELB, but you will need an ELB for every region you're using and that will cost you more than the instance itself. There's a possibility to "detach" an Elastic IP on termination and attach it again in the launch sequence of your new EC2 instance, but I don't have a ready-to-go script for that solution (yet).

Consider whether dynamic content can be cached

If you have a session-based website with lots of changing data, it might not pay off to try and cache the dynamic data. If so, use your CDN purely for the static content on another domain. The CDN step will add latency on cache-misses, so if the miss rate is very high, you might be better of querying your server directly for dynamic content. If, for instance, you use content.example.com as your CDN URL and point Varnish to www.example.com as your origin, you can set your application to use content.example.com as domain for all static file references (images, javascripts, stylesheets) and www.example.com for all other URLs.

Distributing the SSL certificate

Your servers can run solely on ephemeral storage, thanks to the auto-scaling setup. However, one thing that needs to be consistently spread across your endpoints is the SSL certificate itself. I suggest using S3 for this. Put your instances in a security group that is allowed to read from the bucket where you store your certificates and have them pull the necessary files from S3 on launch. This can also be done for the nginx and Varnish config files if you like.

The origin

The origin can be anything you like, it doesn't even have to be an AWS-hosted setup. But it can also be an S3 bucket or simply your website on a loadbalanced EC2 setup. If, for instance, your origin serves a very heavy PHP-website using Apache (like a Magento webshop), you will reduce the load on Apache tremendously by not having to serve all those small static files, but only do the heavy lifting. I've seen examples of heavy loadbalanced setups that could be reduced to half their size by simply using a CDN.