Jaap Haagmans The all-round IT guy


Optimizing Magento to become a speed monster

The post title might be a bit bold, but, contrary to beliefs, it's in no way impossible to have a blazingly fast Magento store. Most of the gains aren't in quick fixes though. Some of the changes will require quite a bit of time.

MySQL query caching

MySQL isn't optimized for Magento out-of-the-box. There's one general mechanism that can make a world of difference for Magento, which is called the query_cache. Try to play with the following settings and measure the results (e.g. using New Relic):

query_cache_type = 1
query_cache_size = 256M

This will enable MySQL to store 256M of query results so that the most common queries (like loading your frontpage products). You can evaluate the actual query cache usage by running the command


A production database should show some free memory for the query cache. Increase the size if it runs out.

PHP Opcode caching

Opcode caching is a mechanism that enables you to store the compiled opcode of your PHP files in shared memory. This reduces access time and eliminates PHP parsing time, meaning PHP files can be executed faster. For Magento, this could easily reduce the time needed for certain actions by seconds. My favourite opcode caching mechanism is APC, simply because it's maintained by the folks at PHP.

Magento Compilation

Even though recent development reduced the need for the Magento compiler, there still is a (small) performance gain, if you don't mind having to compile your installation after you make changes to its files. The Magento compiler does some PHP file concatenation and puts them in a single folder. This reduces the time Magento has to spend in the filesystem.

Store Magento cache in memory

By default, Magento will store its cache files under var/cache. Now, let me first point out that caching should always be enabled for performance reasons (simply because it reduces disk I/O and database calls). However, storing these cache files on the file system will still induce an I/O overhead. If your server is backed by SSD disks, this overhead is pretty small, but if not, you can gain a lot by storing this cache in shared memory. Magento supports memcached out-of-the-box (although your server of course needs memcached installed), but I recently switched to Redis using an extension called Cm_Cache_Backend_Redis. We run it on a separate server because we have multiple webservers, but you might not need to.

Use SSD disks

Magento is quite I/O heavy. Where an IDE drive will only go as fast as 150-200 IOPS and degrades rapidly over time, an SSD disk can easily do 50 times as many IOPS or even more. If I/O is your bottleneck, using SSD is the way to go.

Use front end caching

I know that there are Full Page Caching modules available for Magento, but I recommend dropping them in favour of a front end caching mechanism. Varnish is probably the way to go at this time. The main reason to go for front end caching is that it's highly configurable.

Varnish stores a copy of the webserver's response in shared memory. If another visitor visits the same page, Varnish will serve the page from memory, taking the load off the webserver. Because some pages are dynamic or have dynamic parts, it's important to configure Varnish so that it only caches content that is (more or less) static. Varnish also supports ESI, which enables you to pass through blocks of content from the webserver. If you use a Magento extension to enable Varnish caching, it will do most of the configuration for you (although extensive testing is required).

There are two extensions I have tried, the first one being PageCache. I prefer Turpentine though, because it's more powerful and supports ESI.


EC2 performance and cost vs dedicated servers and in-house solutions

When it comes to performance and cost/benefit analysis, AWS has had to endure quite some criticism over the years. The pricing structure of AWS, though transparent, is often labelled as steep and unfavourable to steady, long-term infrastructures. I agree to some point, but infrastructures are rarely steady. I've seen companies splashing cash on hardware that was utilized at 10% during their lifetime. I've also seen companies that grew faster than their infrastructure allowed, requiring them to do a second investment and concede a big write-off on their recently bought hardware. If you want to avoid these situations, you need to plan ahead and hope you don't catch up with the future too soon. Or you'll have to go out for a crystal ball.

For people who simply can't plan that far ahead, virtualisation provides middle ground. Given that your contracts are flexible you can, for instance, tune up your Exchange server at moments notice, with minimal downtime. AWS goes a little step further, enabling you to control the resources yourself, thus providing you with the possibility to plan around your own schedule.

Many people argue that the services other than bare EC2 are expensive. This is mainly due to the fact that AWS provides an extra level of service. With EC2, you're responsible for everything that happens on your server (no matter what kind of support level agreement you have). If you rent an RDS instance though, AWS also takes responsibility for the software layer. When you compare a large EC2 instance with a large RDS instance, you'll see that the resources provided are comparable, but the price of an RDS instance is 8 cents per hour higher (in the EU region). Now, if you're comfortable managing your own MySQL instance, you're probably better off running MySQL on an EC2 instance. And that goes for almost every service AWS provides. You can even setup your own loadbalancers if you'd like. Or, like I argued before, it's possible to setup your own distributed CDN.


So, let's take a look only at the real building blocks: EC2 instances. How do they perform? And how does that compare to our in-house solutions?

For this comparison, I'm taking a look at some benchmarks taken on an m1.large instance. It's said to have 7.5 GiB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of local instance storage, 64-bit platform. Does that mean anything to you? Well, it doesn't to me. How does an EC2 Compute Unit (ECU) relate to a real CPU for example? And 7.5 GiB of memory sounds great, but if all memory buses on the server are stuffed with slow 8 GB RAM slices (for a total of 64 GB RAM) it probably doesn't relate to a dedicated server with 4x 2 GB DDR4 RAM. We all know that slow RAM can be deadly for general performance. So, let's do a benchmark!

Yes, I know that some of you will say that benchmarks are the root of all evil. They can't be trusted. A benchmark today says nothing about a benchmark tomorrow. And you're probably right. But I just want to know the ballpark we're in. So to do that, I'm using sysbench on a large EC2 instance running the 64 bit Linux AMI.


[ec2-user@ip-10-0-0-17 ~]$ sysbench --test=cpu --cpu-max-prime=20000 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing CPU performance benchmark
Threads started!
Maximum prime number checked in CPU test: 20000
Test execution summary:
    total time:                          36.1470s
    total number of events:              10000
    total time taken by event execution: 36.1343
    per-request statistics:
         min:                                  3.57ms
         avg:                                  3.61ms
         max:                                  4.72ms
         approx.  95 percentile:               3.71ms
Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   36.1343/0.00


[ec2-user@ip-10-0-0-17 ~]$ sysbench --test=fileio --file-total-size=1G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Initializing random number generator from timer.
Extra file open flags: 0
128 files, 8Mb each
1Gb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test                                                                                                                                                                           
Threads started!                                                                                                                                                                                
Time limit exceeded, exiting...                                                                                                                                                                 
Operations performed:  259620 Read, 173080 Write, 553828 Other = 986528 Total
Read 3.9615Gb  Written 2.641Gb  Total transferred 6.6025Gb  (22.536Mb/sec)
 1442.33 Requests/sec executed
Test execution summary:
    total time:                          300.0010s
    total number of events:              432700
    total time taken by event execution: 5.9789
    per-request statistics:
         min:                                  0.01ms
         avg:                                  0.01ms
         max:                                  0.16ms
         approx.  95 percentile:               0.02ms
Threads fairness:
    events (avg/stddev):           432700.0000/0.00
    execution time (avg/stddev):   5.9789/0.00

Do mind that this is network attached storage (EBS), thus uncomparable to a physical disk in a server when it comes to response times. And yes, I know that that's outside the EC2 scope, but Amazon actually recommends against using ephemeral drives for almost anything, so EBS performance is probably what anyone will be looking for anyway. And I'm all about my readers (yes, all 3 of them. Hi mom!).


[ec2-user@ip-10-0-0-17 ~]$ sysbench --test=memory --memory-block-size=1M --memory-total-size=7G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
Running the test with following options:
Number of threads: 1
Doing memory operations speed test
Memory block size: 1024K
Memory transfer size: 7168M
Memory operations type: write
Memory scope type: global
Threads started!
Operations performed: 7168 ( 3649.63 ops/sec)
7168.00 MB transferred (3649.63 MB/sec)
Test execution summary:
    total time:                          1.9640s
    total number of events:              7168
    total time taken by event execution: 1.9552
    per-request statistics:
         min:                                  0.27ms
         avg:                                  0.27ms
         max:                                  0.47ms
         approx.  95 percentile:               0.29ms
Threads fairness:
    events (avg/stddev):           7168.0000/0.00
    execution time (avg/stddev):   1.9552/0.00

Interpreting the results

Well, this is nice and all, but what does it mean? Well, first of all it tells me that the CPU doesn't disappoint. It's a little slower than the quad core 2.53 Ghz processor I've tested locally, which on average does 2.85ms per request, but the 4 ECUs are built up out of 2 "virtual cores", so I suspected performance to be somewhere near a dual core processor. I just don't have any available.

I was actually amazed by the I/O results. I've calculated 1400 IOPS and a throughput of 22.5 Mb/s! Compare that to my 7200RPM SATA disk, which struggles to get 200 IOPS and a throughput of 3.2 Mb/s.

The memory let me down a little. The system I've tested with has 4GB DDR3 RAM and manages to get 4700 ops/s, while the m1.large instance gets 3700 ops/s. It's still quite good though, considering it's shared memory.

Cost comparison

The m1.large instance isn't cheap. If you want an on-demand instance running for a month, it will set you back $190.- in the EU region. However, going for a heavy reserved instance might be a good choice for many and then it can go as low as $60.- per month (one time fee included). Buying a dual core server with 8 GB RAM will, at the moment, set you back around $700.-. Calculating conservatively, power will cost you about $350.- per year (at 15 cents per KWh, which is nowhere near consumer prices in the EU), meaning running this server for 3 years will cost you $1750.-, not counting maintenance, cooling and possible breakdown. The EC2 instance will have cost you $2160.-. And if it breaks down, you can have a new one running in under 2 minutes.

Now, tell me. If someone would tell you it costs $400.- to install and maintain a physical server for 3 years, would you go for it? I would.


Increasing EBS performance and fault tolerance using RAID

Even though I will normally say you should consider your EC2 instances and EBS data as being disposable, this is not always possible. There are setups imaginable that simply cannot make use of S3 for their "dynamic" file storage (e.g. due to use of legacy software packages that highly depend on file system storage). In these situations, only making snapshots might not be sufficient, as the downtime might be quite high.

Increasing performance of EBS

EBS performance is often increased using RAID0, also called striping. Data is distributed over multiple volumes, increasing I/O capabilities. In fact, you can scale your RAID0 setup to up to 16 drives on Windows or even more on Linux. Many AWS users are employing this technique and are reporting it to be quite performant.

What should worry you if the first part of this post applies to you, is that if one EBS drive somehow fails, your entire RAID0 volume will fail, effectively corrupting all data on it. If this doesn't worry you (it might not, many setups on AWS aren't filesystem-dependent), you're now free to go. The rest of this post doesn't apply to you. However, I know there are people out there who will be -very- worried by this.

Before I go on, I'd like to note that Adrian Cockcroft mentions they only use 1TB EBS volumes to reduce (or maybe even eliminate) multi-tenancy, which will generate more consistent I/O results.

Increasing fault tolerance of EBS volumes

Amazon states that EBS volumes are 99,5-99,9% reliable over any given year. Compared to a regular physical drive, that's an impressive number. However, it might not be enough for you. You'd probably think that RAID1 can solve that. According to Amazon, you're wrong. EBS volumes are replicated through an Availability Zone, meaning that if the physical hardware behind your EBS volume goes down, your EBS volume will persist somewhere else in the AZ. So RAID1 will not reduce the chance that you lose your data (technically, this isn't true, but let's not go into that).

However, there's something Amazon seems to overlook. An EBS volume might underperform from time to time. If you don't use RAID1, you will have to just wait it out (or build a new volume from a snapshot). If you do use RAID1, you can quickly swap the EBS volume for a brand new one and rebuild the RAID1 array. That gives you complete control!

I myself am using RAID10 to make use of the advantages of both RAID1 and RAID0. But it's something you'll have to figure out for yourself. In fact, in some cases RAID1 might outperform RAID0 (especially when looking at random reads). However, RAID1 writes are always slower than RAID0 writes.

Resilient filesystems

I will get back to this after we're done setting it up, but we're working on moving to Gluster for centralized file storage. We're currently using a robust NFS solution to mount a webroot volume to our webservers, but it's still a single point of failure. Gluster provides us with a way to set up a resilient cluster for file storage, that can scale endlessly. Our plan is to build it on top of RAID10 EBS volumes and replicate across Availability Zones.

In any case, EBS performance shouldn't be too big of an issue. Yes, the latency might not be ideal for every use case, but if that forms a real issue, you're probably better off renting a dedicated server solution anyway.

Tagged as: , , , 1 Comment