Jaap Haagmans The all-round IT guy

5Jan/140

Optimizing Magento to become a speed monster

The post title might be a bit bold, but, contrary to beliefs, it's in no way impossible to have a blazingly fast Magento store. Most of the gains aren't in quick fixes though. Some of the changes will require quite a bit of time.

MySQL query caching

MySQL isn't optimized for Magento out-of-the-box. There's one general mechanism that can make a world of difference for Magento, which is called the query_cache. Try to play with the following settings and measure the results (e.g. using New Relic):

query_cache_type = 1
query_cache_size = 256M

This will enable MySQL to store 256M of query results so that the most common queries (like loading your frontpage products). You can evaluate the actual query cache usage by running the command

SHOW STATUS LIKE '%Qcache%';

A production database should show some free memory for the query cache. Increase the size if it runs out.

PHP Opcode caching

Opcode caching is a mechanism that enables you to store the compiled opcode of your PHP files in shared memory. This reduces access time and eliminates PHP parsing time, meaning PHP files can be executed faster. For Magento, this could easily reduce the time needed for certain actions by seconds. My favourite opcode caching mechanism is APC, simply because it's maintained by the folks at PHP.

Magento Compilation

Even though recent development reduced the need for the Magento compiler, there still is a (small) performance gain, if you don't mind having to compile your installation after you make changes to its files. The Magento compiler does some PHP file concatenation and puts them in a single folder. This reduces the time Magento has to spend in the filesystem.

Store Magento cache in memory

By default, Magento will store its cache files under var/cache. Now, let me first point out that caching should always be enabled for performance reasons (simply because it reduces disk I/O and database calls). However, storing these cache files on the file system will still induce an I/O overhead. If your server is backed by SSD disks, this overhead is pretty small, but if not, you can gain a lot by storing this cache in shared memory. Magento supports memcached out-of-the-box (although your server of course needs memcached installed), but I recently switched to Redis using an extension called Cm_Cache_Backend_Redis. We run it on a separate server because we have multiple webservers, but you might not need to.

Use SSD disks

Magento is quite I/O heavy. Where an IDE drive will only go as fast as 150-200 IOPS and degrades rapidly over time, an SSD disk can easily do 50 times as many IOPS or even more. If I/O is your bottleneck, using SSD is the way to go.

Use front end caching

I know that there are Full Page Caching modules available for Magento, but I recommend dropping them in favour of a front end caching mechanism. Varnish is probably the way to go at this time. The main reason to go for front end caching is that it's highly configurable.

Varnish stores a copy of the webserver's response in shared memory. If another visitor visits the same page, Varnish will serve the page from memory, taking the load off the webserver. Because some pages are dynamic or have dynamic parts, it's important to configure Varnish so that it only caches content that is (more or less) static. Varnish also supports ESI, which enables you to pass through blocks of content from the webserver. If you use a Magento extension to enable Varnish caching, it will do most of the configuration for you (although extensive testing is required).

There are two extensions I have tried, the first one being PageCache. I prefer Turpentine though, because it's more powerful and supports ESI.

5Aug/133

Databases in a multi region setup

When it comes to hosting your server cluster in multiple datacenters, one of the things that most people have difficulty with is the database setup. I'll do a more generic "do's and don'ts" article on multi-region setups later on, but this is something I've ran into myself on numerous occasions (also outside of AWS). Do mind that this will be focussed on MySQL based use cases, but I'll also tell you why you should consider NoSQL in some situations.

Before talking about complicated forms of master-master replication, I'll provide you with some alternatives that might fit your own use case.

What Amazon does

For their own e-commerce solution, Amazon has geo-specific websites. In the US, the website runs on Amazon.com, in the EU it runs on Amazon.co.uk and Amazon.de. This in part eliminates the need for having a 1-on-1 copy of their database available for every region. It does some endpoint caching using Cloudfront to speed up the websites worldwide (which is why amazon.com feels fast from the EU until you login or order something). However, if placing the order takes too long, you're shown a thank you page that will tell you to wait for confirmation by e-mail (meaning your order has been queued).

Of course, this may not be the scenario you were looking for. Most medium-sized e-commerce companies don't have a website for every region. They just want to manage their website from one central location, but they also want someone from the other end of the world to experience a fast website.

Database slave nodes across the globe

If your application is built so that database writes only occur from the backend which is managed by the company (e.g. a purely informational website with a CMS), you can set up slave nodes in multiple regions. Writes to the master database are passed to its slaves and the new data is available to the world.

Like before, this is only a scenario that works in specific use cases, but it can be useful for some. In this specific use case, you could also consider using Cloudfront for your entire website, but that may be more expensive (especially in the case of (multiple) websites running over SSL) and if your website is updated irregularly, it might require you to alter your change workflow to protect consistency.

Optimize your application for multiple regions

You can build an application that is optimized for running in multiple regions. You will have to dig a little outside the scope of AWS for this, but there are tremendous advantages to this. Take the use case of Twitter for example. A tweet can't be edited, only deleted. This means Twitter can (if necessary) queue the creation of tweets. Using AJAX, the tweet is immediatly displayed, but the database write can take just a little longer. This is useful to ease and spread the load on database servers, but because you don't have to wait for the actual transaction to happen, tweeting feels very fast for your end user (even though the process might take up to minutes on occasion).

Yeah but, you might say, Twitter must have some kind of super-database behind it that can easily replicate across the globe. You might be surprised to hear that Twitter actually uses MySQL to store tweets. Granted, it does use some very fine-grained, sustainable replication and distribution techniques (named Gizzard) and it aggregates and caches a lot of data using other techniques (like FlockDB and memcached), MySQL is still used as the main data store.

Also, take a look at the other end of the spectrum, where you have the fully distributed database model of Bitcoin transactions. There's no master database, but every wallet has a copy of the transaction database. All connected clients are notified of any transaction, of which they notify their peers. This is how the database is synchronized across the globe.

Global replication with multiple masters

However, if you must use global replication, it's not impossible, although it might take some time to architect and set up. I'd like to take Paypal as an example. Paypal uses a transaction based consistency model, meaning that everything can be rebuilt using transactions. Paypal also uses replication across four AWS regions. Let's say it stores your current credit level in a table. If two people send you money at the same time, the transactions may be processed in two different regions. In region 1 your credit is set to 30, in region 2 it's set to 20. When those two updates meet, an inconsistency is detected and the real credit can be calculated using the underlying transactions (that are always unique). This is how Paypal can use a multi-master setup using MySQL.

Master-master replication in MySQL is usually set up so that each node is a slave and a master at the same time. Paypal uses replication in a circular motion. The MySQL node in region 1 is master to the node in region 2 (its slave), which is master to the node in region 3, which is master to the node in region 4, which is master to the node in region 1. This won't sound appealing to every database administrator, because the round time for a write could be quite high. In the case of Paypel, this is fine because every transaction is unique (and, like tweets on Twitter, can't be edited), meaning an eventually-consistent model suits them. But what if you have a typical MySQL PHP application that lets MySQL handle ID generation using auto_increment? Two inserts in the same table on multiple nodes -will- cause a replication error. If the round time for write consistency is 2 seconds (which could be quite conservative), you can imagine this happening all the time on a fairly busy website.

Now, AWS multi-AZ RDS instances don't have this problem. Yes, the replication is done using a master-master setup, but the second DB instance is a hot spare, meaning writes to it will only happen if the other database is down. But a multi-region setup with four active masters will have this problem. This is where MySQLs auto_increment_increment comes in. Set this in your my.cnf on the first node:

auto_increment_increment = 4
auto_increment_offset = 1

On the second node:

auto_increment_increment = 4
auto_increment_offset = 2

Et cetera. This will ensure MySQL increases every ID by the number of nodes available (4 in this case), using a starting offset. This means that the IDs on node 1 will be 1,5,9,13,17,... and on node 2 the range will be 2,6,10,14,18,... This also ensures that when a node can't replicate for some reason, it can "catch up" later without generating additional replication errors.

However, this will not prevent other replication errors. Say that your application has to create invoices and the IDs need to increase by 1 for every invoice (tax regulations in many countries stipulate this). If you use multiple databases, you can't simply take the last invoice ID and add one, because another node might do the same before the changes on the other node have reached it. You also can't use auto_increment, because the invoice numbers won't follow up. You will have to design your application around these kinds of issues. One way to go is setup some kind of invoice queue (SQS can be used for this) that will create these invoices one at a time, making sure invoice IDs follow up. This is an asynchronous process though, meaning you can't immediatly send a response to the client (although you could use NodeJS to simulate it).

Another way to go would be to set up an ID distribution agent, to which every node can send a request for a new ID. It makes sure it distributes every ID only once and can also be setup to support multiple nodes (checking the new ID with the other nodes before giving it out). You will have to take into account that a MySQL insert can also fail on certain occasions (omission of a value that can't be NULL for instance) while the ID has already been granted. So your transaction validation has to be very thorough and you should incorporate a rollback scenario (meaning, for instance, the ID can be reallocated by the distribution agent).

If your MySQL replication setup mainly targets failover scenarios, you might not run into these problems, but it's still something to think about.