Jaap Haagmans The all-round IT guy

23Jul/130

Setting up your own dynamic CDN with edge locations using Varnish and SSL

As I mentioned earlier in my post about the new SSL functionality for Amazon Cloudfront, there's a possibility to set up your own CDN with "edge" locations. I prefer calling them edgy though, because we're using Amazon regions and not the real Amazon Edge locations that are available. But it will provide us with some flexibility. You will only serve from regions you think you need (thus saving costs) and you can always add your own edge location hosted at a datacenter outside of AWS (for instance, somewhere in Amsterdam).

Please mind that I haven't built the POC environment for this yet. I am fairly confident that the below will work, but please comment if you don't agree.

Basically, what we want is to send visitors to the content location nearest to them. On these locations, we will cache static content and try to cache dynamic content as much as possible, while being able to serve content through SSL. Take a look at this sketch for a visual image of what we'll try to do:

DNS-edge-origin

The DNS

For the DNS, we will of course use Amazons Route53. Not only does Route53 serve clients from the nearest possible location (read: location with lowest latency), it can also do health checks and route to the endpoint with the lowest possible latency. Read more about latency based routing in the AWS docs. Set it up to include your edge locations and monitor the health of these locations.

The Edge locations

This is where it gets interesting. There are a few possible ways to go, you can setup a simple Apache/nginx server to host your content, but you will have to worry about keeping copies of your content on every server. It's possible, but it might not be as easy to use. Besides, it will not provide a way to easily serve dynamic content.

I've chosen a Varnish based caching solution for this specific use case, because it's very flexible and provides a lot of tweaking options. Besides, Varnish will perform well on a relatively "light" server. Varnish will not be able to handle SSL termination though, so we will use nginx as a proxy to offload SSL. You can read how to Offload SSL using nginx in their wiki.

Setting up your specific Varnish environment is outside the scope of this article, because there are too many use cases to put into one single article. I will provide a few things to consider though.

Let nginx only handle SSL traffix

Varnish is perfectly able to handle unencrypted traffic. So nginx should only listen to port 443. Varnish can listen to port 80.

Use auto-scaling

For some, this is a no-brainer. I think it's best practise to always use auto-scaling. You will probably not want to scale out your edge location, but you do want to automatically terminate unhealthy EC2 instances and fire up a new one. Something to consider here is that you will have to work around the fact that normally, you will not be able to keep the same IP address for your replacement instance. A possible workaround is using ELB, but you will need an ELB for every region you're using and that will cost you more than the instance itself. There's a possibility to "detach" an Elastic IP on termination and attach it again in the launch sequence of your new EC2 instance, but I don't have a ready-to-go script for that solution (yet).

Consider whether dynamic content can be cached

If you have a session-based website with lots of changing data, it might not pay off to try and cache the dynamic data. If so, use your CDN purely for the static content on another domain. The CDN step will add latency on cache-misses, so if the miss rate is very high, you might be better of querying your server directly for dynamic content. If, for instance, you use content.example.com as your CDN URL and point Varnish to www.example.com as your origin, you can set your application to use content.example.com as domain for all static file references (images, javascripts, stylesheets) and www.example.com for all other URLs.

Distributing the SSL certificate

Your servers can run solely on ephemeral storage, thanks to the auto-scaling setup. However, one thing that needs to be consistently spread across your endpoints is the SSL certificate itself. I suggest using S3 for this. Put your instances in a security group that is allowed to read from the bucket where you store your certificates and have them pull the necessary files from S3 on launch. This can also be done for the nginx and Varnish config files if you like.

The origin

The origin can be anything you like, it doesn't even have to be an AWS-hosted setup. But it can also be an S3 bucket or simply your website on a loadbalanced EC2 setup. If, for instance, your origin serves a very heavy PHP-website using Apache (like a Magento webshop), you will reduce the load on Apache tremendously by not having to serve all those small static files, but only do the heavy lifting. I've seen examples of heavy loadbalanced setups that could be reduced to half their size by simply using a CDN.

23Jul/130

Serving dynamic content using Cloudfront

As I mentioned earlier, it's possible to serve dynamic content using Cloudfront. Which is wonderful, because this means Cloudfront emerged from being a "simple" CDN to being an actual caching solution for your entire website. There are a few things to keep in mind though.

Misses and hits

Cloudfront is a caching mechanism. In fact, I wouldn't be surprised if it's based on something like the proven Varnish. So it works with misses and hits. If it doesn't find the content the visitor is looking for, it will count as a miss and put in a request to the origin server. After this, it will cache the missed fragment. If your miss rate is very high, your website will in fact be slower for most of your visitors. If you can't properly cache your website, you're probably better off not using Cloudfront. However, a hit will be very fast. Some websites can produce a hit rate of over 99%, which means they serve almost every visitor from an edge location, while the origin can remain at rest.

Cookies and sessions

If your website serves content that is visitor-specific (like shopping carts or an account page), you will have to specify the cookies that are used to identify the session. If you don't, the cart of the first user that visited a page will be cached and displayed to every other user. If Cloudfront knows about these session cookies, it will be able to store a version of the page for each individual visitor. If you display the shopping cart on every page though, this might add overhead you'll want to avoid. If that's the case, loading the cart in a separate request using AJAX might be a better way to go, so that the majority of the page can be cached once for all users while retaining the websites dynamic nature.

Page expiry

You can handle expiration of pages entirely within the origin of your website. By default, Cloudfront will assume your objects or pages expire after 24 hours and will check for updates once a day. If your website is pretty much static, this could be fine for you. Every page will have one "slow hit" per day and that's it. However, many websites will require a much lower setting because the underlying data changes. You can set the max-age on your cache control header value on every page and Cloudfront will respect that. So if you have a very busy blog with commenting functionality, you could set the max-age to 600 seconds (10 minutes) on your homepage and to 10 seconds on your post pages for instance. If every post is visited every second, that will reduce load time for 90% of your visitors. But in this case, you could also consider loading comments through AJAX, reducing the expiration time needed (in fact, the expiration time could be very high for posts).

SSL

If you have an e-commerce website, you probably handle (parts of) user requests through SSL. Custom SSL domains with Cloudfront come at a price though, so you will want to think this through. An option might be to use an URL like secure.example.com for your encrypted pages and send those requests directly to your origin, while serving "unsecure" pages through Cloudfront.

14Jul/130

Amazon AWS Cloudfront now supports custom SSL domains, will we use it?

As you may know, Amazon Cloudfront is a great service that provides you with the possibility to serve both static and dynamic bits of content from an edge location near the end user. I'll do an article on how to optimize Cloudfront for dynamic content later, but I'd like to talk about a new feature that Amazon presented a month ago.

If you're using Cloudfront as a traditional CDN, you'll probably have a CNAME configured at content.yourdomain.com or static.yourdomain.com or similar, pointing to Cloudfront. For websites running on HTTP, that's perfectly fine. However, if you're using HTTPS, up until a month ago, this would have not been possible. Amazon didn't provide customers with a possibility to upload an SSL certificate for their CDN domain.

However, that has changed. As of mid-June 2013, Amazon supports what they call "Custom SSL certificates", basically enabling you to upload your own SSL certificate that will be distributed across all edge locations.

There is a downside though, which is the cost of this feature. It amounts to a whopping $600.- per certificate per month (pro-rated by the hour, of course). For us, this would mean a 40% increase in cost for our entire AWS infrastructure, which is why we opted not to implement it. We're continuing use of our nginx-based EC2 server as our CDN. We'd love to serve our static content from edge locations, but not at a 40% cost increase.

If you don't mind using a .cloudfront.net subdomain for your static content, you can of course use Amazons wildcard SSL certificate at a slightly higher rate per 10.000 requests. For many companies, this will do fine.

Update: Amazon has updated its announcement to explain the high cost of this feature. They state the following:

Some of you have expressed surprise at the price tag for the use of SSL certificates with CloudFront. With this custom SSL certificate feature, each certificate requires one or more dedicated IP addresses at each of our 40 CloudFront locations. This lets us give customers great performance (using all of our edge locations), great security (using a dedicated cert that isn’t shared with anyone else) and great availability (no limitations as to which clients or browsers are supported). As with any other CloudFront feature, there are no up-front fees or professional services needed for setup. Plus, we aren’t charging anything extra for dynamic content, which makes it a great choice for customers who want to deliver an entire website (both static and dynamic content) in a secure manner.

The thing is, for $600.- per month, I could rent more than 40 on-demand micro instances, each with its own elastic (dedicated) IP. If you'd spread 8 heavy-reserved small instances over all major regions, you'd be able to use Route53's latency-based routing and it would probably cost you less than $150.- per month (traffic not included). Latency might not be as low as with CloudFront, but I think it's definitely something I'd consider if a client wants to lower its global latency.

I'll do a post about this as well in the near future.

Tagged as: , , No Comments