Jaap Haagmans The all-round IT guy

7Aug/132

EC2 performance and cost vs dedicated servers and in-house solutions

When it comes to performance and cost/benefit analysis, AWS has had to endure quite some criticism over the years. The pricing structure of AWS, though transparent, is often labelled as steep and unfavourable to steady, long-term infrastructures. I agree to some point, but infrastructures are rarely steady. I've seen companies splashing cash on hardware that was utilized at 10% during their lifetime. I've also seen companies that grew faster than their infrastructure allowed, requiring them to do a second investment and concede a big write-off on their recently bought hardware. If you want to avoid these situations, you need to plan ahead and hope you don't catch up with the future too soon. Or you'll have to go out for a crystal ball.

For people who simply can't plan that far ahead, virtualisation provides middle ground. Given that your contracts are flexible you can, for instance, tune up your Exchange server at moments notice, with minimal downtime. AWS goes a little step further, enabling you to control the resources yourself, thus providing you with the possibility to plan around your own schedule.

Many people argue that the services other than bare EC2 are expensive. This is mainly due to the fact that AWS provides an extra level of service. With EC2, you're responsible for everything that happens on your server (no matter what kind of support level agreement you have). If you rent an RDS instance though, AWS also takes responsibility for the software layer. When you compare a large EC2 instance with a large RDS instance, you'll see that the resources provided are comparable, but the price of an RDS instance is 8 cents per hour higher (in the EU region). Now, if you're comfortable managing your own MySQL instance, you're probably better off running MySQL on an EC2 instance. And that goes for almost every service AWS provides. You can even setup your own loadbalancers if you'd like. Or, like I argued before, it's possible to setup your own distributed CDN.

Performance

So, let's take a look only at the real building blocks: EC2 instances. How do they perform? And how does that compare to our in-house solutions?

For this comparison, I'm taking a look at some benchmarks taken on an m1.large instance. It's said to have 7.5 GiB of memory, 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each), 850 GB of local instance storage, 64-bit platform. Does that mean anything to you? Well, it doesn't to me. How does an EC2 Compute Unit (ECU) relate to a real CPU for example? And 7.5 GiB of memory sounds great, but if all memory buses on the server are stuffed with slow 8 GB RAM slices (for a total of 64 GB RAM) it probably doesn't relate to a dedicated server with 4x 2 GB DDR4 RAM. We all know that slow RAM can be deadly for general performance. So, let's do a benchmark!

Yes, I know that some of you will say that benchmarks are the root of all evil. They can't be trusted. A benchmark today says nothing about a benchmark tomorrow. And you're probably right. But I just want to know the ballpark we're in. So to do that, I'm using sysbench on a large EC2 instance running the 64 bit Linux AMI.

CPU

[ec2-user@ip-10-0-0-17 ~]$ sysbench --test=cpu --cpu-max-prime=20000 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
 
Running the test with following options:
Number of threads: 1
 
Doing CPU performance benchmark
 
Threads started!
Done.
 
Maximum prime number checked in CPU test: 20000
 
 
Test execution summary:
    total time:                          36.1470s
    total number of events:              10000
    total time taken by event execution: 36.1343
    per-request statistics:
         min:                                  3.57ms
         avg:                                  3.61ms
         max:                                  4.72ms
         approx.  95 percentile:               3.71ms
 
Threads fairness:
    events (avg/stddev):           10000.0000/0.00
    execution time (avg/stddev):   36.1343/0.00

I/O

[ec2-user@ip-10-0-0-17 ~]$ sysbench --test=fileio --file-total-size=1G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
 
Running the test with following options:
Number of threads: 1
Initializing random number generator from timer.
 
 
Extra file open flags: 0
128 files, 8Mb each
1Gb total file size
Block size 16Kb
Number of random requests for random IO: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test                                                                                                                                                                           
Threads started!                                                                                                                                                                                
Time limit exceeded, exiting...                                                                                                                                                                 
Done.                                                                                                                                                                                           
 
Operations performed:  259620 Read, 173080 Write, 553828 Other = 986528 Total
Read 3.9615Gb  Written 2.641Gb  Total transferred 6.6025Gb  (22.536Mb/sec)
 1442.33 Requests/sec executed
 
Test execution summary:
    total time:                          300.0010s
    total number of events:              432700
    total time taken by event execution: 5.9789
    per-request statistics:
         min:                                  0.01ms
         avg:                                  0.01ms
         max:                                  0.16ms
         approx.  95 percentile:               0.02ms
 
Threads fairness:
    events (avg/stddev):           432700.0000/0.00
    execution time (avg/stddev):   5.9789/0.00

Do mind that this is network attached storage (EBS), thus uncomparable to a physical disk in a server when it comes to response times. And yes, I know that that's outside the EC2 scope, but Amazon actually recommends against using ephemeral drives for almost anything, so EBS performance is probably what anyone will be looking for anyway. And I'm all about my readers (yes, all 3 of them. Hi mom!).

Memory

[ec2-user@ip-10-0-0-17 ~]$ sysbench --test=memory --memory-block-size=1M --memory-total-size=7G run
sysbench 0.4.12:  multi-threaded system evaluation benchmark
 
Running the test with following options:
Number of threads: 1
 
Doing memory operations speed test
Memory block size: 1024K
 
Memory transfer size: 7168M
 
Memory operations type: write
Memory scope type: global
Threads started!
Done.
 
Operations performed: 7168 ( 3649.63 ops/sec)
 
7168.00 MB transferred (3649.63 MB/sec)
 
 
Test execution summary:
    total time:                          1.9640s
    total number of events:              7168
    total time taken by event execution: 1.9552
    per-request statistics:
         min:                                  0.27ms
         avg:                                  0.27ms
         max:                                  0.47ms
         approx.  95 percentile:               0.29ms
 
Threads fairness:
    events (avg/stddev):           7168.0000/0.00
    execution time (avg/stddev):   1.9552/0.00

Interpreting the results

Well, this is nice and all, but what does it mean? Well, first of all it tells me that the CPU doesn't disappoint. It's a little slower than the quad core 2.53 Ghz processor I've tested locally, which on average does 2.85ms per request, but the 4 ECUs are built up out of 2 "virtual cores", so I suspected performance to be somewhere near a dual core processor. I just don't have any available.

I was actually amazed by the I/O results. I've calculated 1400 IOPS and a throughput of 22.5 Mb/s! Compare that to my 7200RPM SATA disk, which struggles to get 200 IOPS and a throughput of 3.2 Mb/s.

The memory let me down a little. The system I've tested with has 4GB DDR3 RAM and manages to get 4700 ops/s, while the m1.large instance gets 3700 ops/s. It's still quite good though, considering it's shared memory.

Cost comparison

The m1.large instance isn't cheap. If you want an on-demand instance running for a month, it will set you back $190.- in the EU region. However, going for a heavy reserved instance might be a good choice for many and then it can go as low as $60.- per month (one time fee included). Buying a dual core server with 8 GB RAM will, at the moment, set you back around $700.-. Calculating conservatively, power will cost you about $350.- per year (at 15 cents per KWh, which is nowhere near consumer prices in the EU), meaning running this server for 3 years will cost you $1750.-, not counting maintenance, cooling and possible breakdown. The EC2 instance will have cost you $2160.-. And if it breaks down, you can have a new one running in under 2 minutes.

Now, tell me. If someone would tell you it costs $400.- to install and maintain a physical server for 3 years, would you go for it? I would.

23Jul/130

Setting up your own dynamic CDN with edge locations using Varnish and SSL

As I mentioned earlier in my post about the new SSL functionality for Amazon Cloudfront, there's a possibility to set up your own CDN with "edge" locations. I prefer calling them edgy though, because we're using Amazon regions and not the real Amazon Edge locations that are available. But it will provide us with some flexibility. You will only serve from regions you think you need (thus saving costs) and you can always add your own edge location hosted at a datacenter outside of AWS (for instance, somewhere in Amsterdam).

Please mind that I haven't built the POC environment for this yet. I am fairly confident that the below will work, but please comment if you don't agree.

Basically, what we want is to send visitors to the content location nearest to them. On these locations, we will cache static content and try to cache dynamic content as much as possible, while being able to serve content through SSL. Take a look at this sketch for a visual image of what we'll try to do:

DNS-edge-origin

The DNS

For the DNS, we will of course use Amazons Route53. Not only does Route53 serve clients from the nearest possible location (read: location with lowest latency), it can also do health checks and route to the endpoint with the lowest possible latency. Read more about latency based routing in the AWS docs. Set it up to include your edge locations and monitor the health of these locations.

The Edge locations

This is where it gets interesting. There are a few possible ways to go, you can setup a simple Apache/nginx server to host your content, but you will have to worry about keeping copies of your content on every server. It's possible, but it might not be as easy to use. Besides, it will not provide a way to easily serve dynamic content.

I've chosen a Varnish based caching solution for this specific use case, because it's very flexible and provides a lot of tweaking options. Besides, Varnish will perform well on a relatively "light" server. Varnish will not be able to handle SSL termination though, so we will use nginx as a proxy to offload SSL. You can read how to Offload SSL using nginx in their wiki.

Setting up your specific Varnish environment is outside the scope of this article, because there are too many use cases to put into one single article. I will provide a few things to consider though.

Let nginx only handle SSL traffix

Varnish is perfectly able to handle unencrypted traffic. So nginx should only listen to port 443. Varnish can listen to port 80.

Use auto-scaling

For some, this is a no-brainer. I think it's best practise to always use auto-scaling. You will probably not want to scale out your edge location, but you do want to automatically terminate unhealthy EC2 instances and fire up a new one. Something to consider here is that you will have to work around the fact that normally, you will not be able to keep the same IP address for your replacement instance. A possible workaround is using ELB, but you will need an ELB for every region you're using and that will cost you more than the instance itself. There's a possibility to "detach" an Elastic IP on termination and attach it again in the launch sequence of your new EC2 instance, but I don't have a ready-to-go script for that solution (yet).

Consider whether dynamic content can be cached

If you have a session-based website with lots of changing data, it might not pay off to try and cache the dynamic data. If so, use your CDN purely for the static content on another domain. The CDN step will add latency on cache-misses, so if the miss rate is very high, you might be better of querying your server directly for dynamic content. If, for instance, you use content.example.com as your CDN URL and point Varnish to www.example.com as your origin, you can set your application to use content.example.com as domain for all static file references (images, javascripts, stylesheets) and www.example.com for all other URLs.

Distributing the SSL certificate

Your servers can run solely on ephemeral storage, thanks to the auto-scaling setup. However, one thing that needs to be consistently spread across your endpoints is the SSL certificate itself. I suggest using S3 for this. Put your instances in a security group that is allowed to read from the bucket where you store your certificates and have them pull the necessary files from S3 on launch. This can also be done for the nginx and Varnish config files if you like.

The origin

The origin can be anything you like, it doesn't even have to be an AWS-hosted setup. But it can also be an S3 bucket or simply your website on a loadbalanced EC2 setup. If, for instance, your origin serves a very heavy PHP-website using Apache (like a Magento webshop), you will reduce the load on Apache tremendously by not having to serve all those small static files, but only do the heavy lifting. I've seen examples of heavy loadbalanced setups that could be reduced to half their size by simply using a CDN.