Jaap Haagmans The all-round IT guy


Expanding the reach of a WiFi network

I live in a modern home, where all bearing walls and floors consist of fortified concrete. Our internet connection enters the house on the ground floor, where our WiFi router and my office is located and we sleep on the second floor (or third floor if we'd live in the US or Russia). The (first-world) problem with this is that we have nearly no WiFi signal in our bedroom. During the winter, I always like to use my laptop in bed on weekend mornings, but we only have one RJ45 socket in our bedroom, which means my wife will have to use the crappy WiFi connection (as I usually wake up earlier).

So I've picked up the plan to expand the reach of our WiFi network to the 2nd storey. To do this, I'm buying a second WiFi access point that I can connect to one of the RJ45 sockets on our first floor. By setting the authentication settings (including the SSID) to exactly the same configuration as the other WiFi AP, all our devices (phones, tablets, laptops, refrigerator, just kidding) will be able to connect to this second AP seamlessly. However, you need to make sure your second AP doesn't interfere with the first one. To do this, first disable any router functionality on the second AP to create a "bridge" where the first AP (and router) serves as a DHCP server. If you can, also give your second AP a static IP address not included in the DHCP range so you can easily access it.

The second important step is to pick a WiFi channel for both your APs. These channels correspond with frequencies (usually) ranging from 2.4 to 2.5 Ghz, usually numbered 1 - 12 or 14. The bandwidth of a WiFi signal is approximately 20 Mhz while the steps between the channels are 5 Mhz, which is why it is advised to keep 4 unused channels (25 Mhz) between your used channels. This is why you'll find that most networks use channels 1, 6 and 11. You're not bound to these channels though. So, pick two channels you like (e.g. 1 and 6) to make sure the two networks don't interfere with each other.

You can also easily expand to three APs this way, but when expanding further, make sure that when you place your APs, they don't interfere with APs within your network that use the same channel. This can be done using measurement tools like WirelessMon. This will need some careful planning though.


Replacing a Bitlocker encrypted disk (with an SSD)

Our laptops use Dell docking stations, which leads to most of them being used in a (slightly) tilted stance. Of course, laptops are also carried around frequently, but this tilted stance seems to have a big impact on our disk performance and durability. We've had harddisks breaking down after just under 1 year and few have survived the 2-year mark. Of course, Dell replaces these harddisks, but the disk performance has become so low that we've decided to replace them all with SSDs.

So, we bought a stack of 128GB SSD drives, which led to problem 1: the source disk (320GB) is larger than the target disk. Windows ordinarily won't let you resize a 320GB disk below the 160GB mark and generic solutions to this problem (like using Perfectdisk to optimize a drive for shrinking) didn't work. So I thought I'd use GParted to resize the partition, but there was problem 2: the drives are encrypted with Bitlocker. GParted doesn't support the resizing of Bitlocker encrypted partitions.

After giving it much thought, I decided to decrypt the drive, something I was unwilling to do at first, but I unfortunately had to. It took just over 4 hours to finish (during which it rattled loudly. After that, I used GParted to resize the drive to 75 GB (to make the copy process faster), booted back into Windows and encrypted the drive. "Why?", you might ask. Well, I wasn't sure whether Dell would void the warranty when the laptop would have a non-Dell drive and my contact couldn't tell for sure either, so I wanted to keep hold of the old drives until our warranty period has passed. An important thing to note is that I don't recommend to store the new encryption key on the drive. I hope I don't have to explain why. If you're worried about USB safety (and you should), print out the key, put it in a sealed envelope and store it wherever your company stores important (confidential) documents.

N.B. If you are copying to a disk that is at least larger than half the size of your current disk and you can free up more than 50% of the space on your current disk, you can skip decrypting the drive. You will most likely be able to use the Windows partition manager to shrink the drive enough to be able to copy the partition to the new drive.

Copy the MBR

I use a desktop PC running Linux for development and it had 2 unused SATA3 slots. If you don't have one available, you can also buy an external SATA3 adapter and do all this from a Linux Live CD (like Ubuntu). So from here I put both the old disk and the SSD into the Linux machine. Because the drives had been re-encrypted with Bitlocker enabled, I had to ensure the entire MBR would remain the same. The MBR is located in the first 512 bytes of a drive and consists of the boot code, the partition table and a signature. I wanted to copy them all, so I issued the following dd command:

dd if=/dev/sdc of=/dev/sdd bs=512 count=1

Mind that I already have 2 disks in the desktop PC I'm using to clone these drives, so the drives are located at /dev/sdc and /dev/sdd. If the new drive is as big as or bigger than the old drive, you can easily copy the entire drive this way (just choose a bigger block size and omit the count), but for me this wasn't the case.

Clone the partitions

You'll now see in /proc/partitions that the drive /dev/sdd is partitioned the exact same way as /dev/sdc. From here on, you can copy all partitions (one by one) from /dev/sdc to /dev/sdd. I had two partitions, the first one was exactly 100 MB and the second just over 58 GB. Choosing a bigger block size (of at most the size of your drives' cache) will make the copying faster, but if your partition doesn't consist of a round number of these blocks, it will run out of space on the partition, so this will need some calculation. For me, /dev/sda1 was easy, because it was exactly 102400 KB in size (as seen in /proc/partitions), so I chose a block size of 10485760 (10 MB), meaning 10 equal blocks were to be copied. My second partition was a little more difficult and the biggest round divider I could find that was under 32 MB was 1 MB, which is what I chose. It took just under an hour to make the copy. These are the commands I used:

dd if=/dev/sdc1 of=/dev/sdd1 bs=10485760 conv=notrunc,noerror,sync
dd if=/dev/sdc2 of=/dev/sdd2 bs=1048576 conv=notrunc,noerror,sync

Important: Make sure you get the "if" and "of" bits right, please double check! If after copying, you discover your old disk is at /dev/sdd and your new one is at /dev/sdc, you've just erased your old disk!

Resize the disk

After this, I placed the new drive into the laptop and it immediatly worked, but I had to enlarge the partition. This can be easily done in Windows' own partition manager, but be sure to temporarily disable Bitlocker and restart once before re-enabling Bitlocker, because otherwise you will have to dig up the Bitlocker key you just hid somewhere sneaky. Changes to the partition table will trigger Bitlocker to lock your computer and ask for the key. Disabling Bitlocker for the next restart prevents this from happening.


Afraid of the cloud? No, you’re not.

I'm sensing this is becoming a very hot topic again, especially since the "celebrity nude leak scandal" that has unfolded yesterday. Which is why I'd like to take this time to advocate the cloud, or better said, tell you why it isn't really "the cloud" that's at fault here.

I'm not going to explicitly define the cloud here, because if you read this, you should already know what the cloud is. If you don't, check this Wikipedia article. This definition is quite broad and it allows for iCloud to be called a cloud service.

If the problem is indeed with a security issue with iCloud (which hasn't been confirmed yet), even most of us cloud-geeks will probably say that something like this was bound to happen. Everyone's been putting their images on the cloud, without giving it any real thought. Apple is a company that focuses on usability and -not- on security. It's their main selling point, but it's also a weakness. And way too big a weakness for me.

Luckily, it doesn't seem to be a situation where there was just a big heap of data, easily unlocked with just one master key. It appears to be a flaw in the login method for the Find Your Phone app that Apple provides, which allowed for brute force attacks (although this is still unconfirmed). This means that if you're not a celebrity, you probably haven't been leaked.

iCloud in fact does encrypt your data and the key to decrypting it is your email/password combination. But that also means that anyone who knows, can guess or brute force your password can access your iCloud storage. It even seems that Apple archives deleted images, meaning they can be retrieved even after they're deleted from your phone.

This is not a problem of the cloud as a whole though. Huge data storage can be protected with proper encryption methods. You can protect your account with two-factor authentication, but it's not something Apple actively encourages. Which is a shame, because it's exactly what could have saved all these celebrities from having their pictures leaked. They were probably not even aware that there was a risk involved and that they could have prevented it themselves.

Google does that better, although I feel they should also take note. The added security for Google accounts can be complicated for some and might not be as airtight as possible. I'll get back to that in another article.

We're leaving the cloud!

No, you're not. The cloud is everywhere now, leaving "the cloud" will mean shutting off your internet connection. However, you might want to think about what you're doing with all these services. For instance, have you been sending sensitive information to someone using Whatsapp? Whatsapp has been known for its security flaws (especially when used over Wifi) and no one really knows what Facebook does with your messages. And do you really need iCloud? iCloud is in fact a horribly broken concept for anyone who just wants to backup important data.

When you're done reading this, enable two-factor authentication on your Gmail account and, if you're using it, on your Apple account and you've probably become a lot safer instantly, but also think about what you're doing on "the cloud". Because even though most cloud services are protected better than you can imagine (and some aren't, which is really worth noting), every system is vulnerable at some point in time.


Finally, SSD-backed EBS drives

Last week, Amazon Web Services announced they now support SSD drives for EBS. Although some say this isn't news, especially since you were already able to get much more I/O operations in per second for provisioned IOPs drives, you can now choose between magnetic storage based EBS volumes (at 5 cents / GB) and SSD-based ones (starting at 10 cents / GB).

The "regular" SSD volumes will get you a guaranteed baseline of 3 IOPs per provisioned GB, but is able to burst to 3000 IOPs in total, which in all is much better than traditional EBS volumes, at a marginally lower price (since you don't pay for the IOPs). There are, of course, also SSD-volumes with provisioned IOPs, but it's suggested that these are actually the same as the provisioned IOPs volumes we previously had. Which makes sense, because magnetic storage could never provide 4.000 IOPs.

One little thing to consider: if you've previously always provisioned 1 TB for all your EBS volumes to ensure consistent performance, that's still the most cost-effective strategy. A "regular" 1 TB SSD volume will cost you $100.- per month and get you 3000 IOPs, while getting 3000 provisioned IOPs for a 100 GB volume will set you back a little over $200.-. This might need some thought, but it's really worth noting.

Filed under: AWS, EBS, Performance No Comments

Optimizing Magento to become a speed monster

The post title might be a bit bold, but, contrary to beliefs, it's in no way impossible to have a blazingly fast Magento store. Most of the gains aren't in quick fixes though. Some of the changes will require quite a bit of time.

MySQL query caching

MySQL isn't optimized for Magento out-of-the-box. There's one general mechanism that can make a world of difference for Magento, which is called the query_cache. Try to play with the following settings and measure the results (e.g. using New Relic):

query_cache_type = 1
query_cache_size = 256M

This will enable MySQL to store 256M of query results so that the most common queries (like loading your frontpage products). You can evaluate the actual query cache usage by running the command


A production database should show some free memory for the query cache. Increase the size if it runs out.

PHP Opcode caching

Opcode caching is a mechanism that enables you to store the compiled opcode of your PHP files in shared memory. This reduces access time and eliminates PHP parsing time, meaning PHP files can be executed faster. For Magento, this could easily reduce the time needed for certain actions by seconds. My favourite opcode caching mechanism is APC, simply because it's maintained by the folks at PHP.

Magento Compilation

Even though recent development reduced the need for the Magento compiler, there still is a (small) performance gain, if you don't mind having to compile your installation after you make changes to its files. The Magento compiler does some PHP file concatenation and puts them in a single folder. This reduces the time Magento has to spend in the filesystem.

Store Magento cache in memory

By default, Magento will store its cache files under var/cache. Now, let me first point out that caching should always be enabled for performance reasons (simply because it reduces disk I/O and database calls). However, storing these cache files on the file system will still induce an I/O overhead. If your server is backed by SSD disks, this overhead is pretty small, but if not, you can gain a lot by storing this cache in shared memory. Magento supports memcached out-of-the-box (although your server of course needs memcached installed), but I recently switched to Redis using an extension called Cm_Cache_Backend_Redis. We run it on a separate server because we have multiple webservers, but you might not need to.

Use SSD disks

Magento is quite I/O heavy. Where an IDE drive will only go as fast as 150-200 IOPS and degrades rapidly over time, an SSD disk can easily do 50 times as many IOPS or even more. If I/O is your bottleneck, using SSD is the way to go.

Use front end caching

I know that there are Full Page Caching modules available for Magento, but I recommend dropping them in favour of a front end caching mechanism. Varnish is probably the way to go at this time. The main reason to go for front end caching is that it's highly configurable.

Varnish stores a copy of the webserver's response in shared memory. If another visitor visits the same page, Varnish will serve the page from memory, taking the load off the webserver. Because some pages are dynamic or have dynamic parts, it's important to configure Varnish so that it only caches content that is (more or less) static. Varnish also supports ESI, which enables you to pass through blocks of content from the webserver. If you use a Magento extension to enable Varnish caching, it will do most of the configuration for you (although extensive testing is required).

There are two extensions I have tried, the first one being PageCache. I prefer Turpentine though, because it's more powerful and supports ESI.


Stop developing apps – NOW!

I don't know if I'm alone in this, but I'm getting tired of companies and governments launching expensive apps to get their message through. If it's up to me, we're at the very end of the app-era.

Of course, app developers would disagree and I'm not really being fair with this title (but hey, titles are evil, everyone knows that), because I'm not really talking about every kind of app. Mobile gaming is the best example of legitimate app development. Still, I think I'm being generous if I say less than 20% of all apps have real value.

What I am talking about are native apps that don't provide any kind of content or functionality a website could. Small hotels and restaurants developing reservation apps that cost tens of thousands of dollars (or euros) are just throwing money down the drain. Or a local business launching an app with opening hours and directions (yes, I've seen this). But more importantly, I've seen businesses that don't have a proper mobile site, but do have an iPhone app (no Android though, because that's too expensive). I don't want your app that I'm going to use only once, I don't (well, I do, but I don't want to) have an iphone, I just want to see where you are and if you sell what I'm looking for!

Especially to the last kind of business (which is a bigger group than you think): don't try to get on the bandwagon and be "hip". Invest time in your website, make sure it functions properly on a PC, tablet and phone (the term you might want to embrace is "responsive design"). If you then still have money left, I'd strongly encourage you to NOT THROW IT INTO SOME KIND OF STUPID APP!!!


CSS and Javascript parsing time related to page speed

For a client, I've been implementing front-end caching in a couple of test scenarios. I managed to get the TTFB (Time To First Bit) down to under 100 milliseconds (and coming from 1.6s on average, that's a big win), but I still wasn't happy about the full loading time. We were using a self-built CDN for our CSS, Javascript and images, which meant that these were also loaded in a few milliseconds, but the time between loading the page and loading the last couple of images was still 1.5 seconds. So, what happened there? There was no cascading-effect for these images (they were all requested at the same time), but I saw the following behaviour in my Firebug Net window:


Now, this is something I haven't really seen before. Between the time the Javascript and CSS is delivered and the images are loaded lies about 1 second of nothingness. That can't be right, can it?

Of course, I'm exaggerating a little here. I have seen this before and I know exactly what it means, but the scale surprised me. What this means is that it takes my browser a full second to parse the CSS and Javascript. That means that, no matter how much I can actually speed up loading this website on the server side, the website will never fully load in under 1 second because of the way the website is built. Always having to deal with server side performance issues (which was quite straight-forward in this case), this was the first time I actually had to deal with client-side performance.

You might notice that there are 17 requests to Javascript and CSS files. However, that's not really the issue here. Of course, it's best practise to combine and compress these files (preferably using application logic), which is something I will get to, but for testing purposes it actually helps in this case.

The CSS files loaded were big. Very big. And they totalled up to a whopping 220 KB. Looking at them, it seemed to me that someone had taken the C in CSS a little too literally. For every change to the website (including big redesigns), new lines were added in the CSS and old CSS was left untouched. I could spot blocks spanning hundreds of lines that were totally redundant through the entire website, relating to stuff that once existed, but was now removed from the website. So, I downloaded CSS Usage, a Firefox plugin that can spot unused CSS lines. With this tool, I was able to remove over a thousand lines of unused CSS in less than an hour.

However, this is unmeasurable. Even after deleting a thousand lines, I wasn't able to find any significant improvement in loading time. I needed some kind of profiler. Now, I've been told that Opera has some kind of CSS profiler built-in. And I once loved Opera (before Mozilla came into the picture), so I was happy to give it a whirl. Its timeline gave me a very good look at what's happening between receiving the CSS and finish building the actual page. But its profiler was actually able to record which CSS selectors were used through the entire website (by recording usage while clicking through the website). That meant I was actually able to remove -all- unused CSS selectors. Talk about clean-up, ey?

After doing this, the timeline seemed to approve of my changes. CSS loading time went down, but I'm still not entirely happy. At this point though, it got a little too front-endy for my taste, so I might get back to it at a later time. I could imagine that using a specific type of selectors or style properties could be slower than equivalent pieces of code. Or maybe our stylesheet as a whole just got too heavy. Do note that this -is- in fact a problem with the increase in mobile devices coming to these websites. So it's something we will have to be very wary of.

By moving most of the Javascript from the head tags to the bottom of the page (which is something I've done on most of my projects for a while now), the total loading time didn't get lower, but the DOM loads a lot faster. If your website isn't built by Javascript (which is done more and more in HTML5) this could make your website appear a lot faster. Be aware though, that if you have specific Javascript code that's essential to how your website works (like shopping cart functionality), you might want to keep that in the head, as it might frustrate end-users.

Speeding up the Javascript bit of things is a lot harder though. I myself tend to stay away from heavy JS frameworks like jQuery as much as possible. But I'm no front-ender. Most front-enders I know start with some kind of CSS bootstrap box, include jQuery and build from that. I like my code to be a little more modular if possible (although I'm very aware of the fact that I do use Ruby on Rails a lot, which will provoke a big "ugh" for people coming from more lean development backgrounds). jQuery is very powerful, but that makes it heavy which is not always needed.

However, I haven't really found the big answer. The client was quite happy (after looking at the numbers even more so), because I did my job well. At some point I will have to give the task back to a front-ender, but I really think a big redesign (from the ground up) is warranted in this case.

Filed under: Performance No Comments

If your Magento backend doesn’t seem to load its CSS/JS files correctly, take note

I've had the following problem before, so I should probably have known what the problem was, but I failed to take note the last time. After a Magento migration (to a test location), I quickly had it up and running for the front-end, but the backend (admin section) didn't seem to load CSS stylesheets and/or Javascript files properly. All references to these files did a lookup to "skin/adminhtml/default/..." instead of the full URL (or even a relative path).

When looking around the internet, I found that everyone who was having the same problem simply forgot to flush their cache. So that was not very helpful. Some people had problems because they forgot to add a trailing slash to their web/(un)secure/base_url in core_config_data, which is also a quick fix. I was still having problems though.

Until I started digging through my core_config_data table, looking for a path LIKE '%url%'. And I found this entry:

|      1419 | stores  |        0 | web/unsecure/base_url                   |                                                     |

Hmmm, that looks fishy. On our live environment, it doesn't seem to pose any problems, but here it appears to override our default web/unsecure/base_url setting. So I removed the row and everything was fine.


VPC now supports EIP management on instance launch

Before, it was impossible to specify whether you wanted to use a public (or elastic) IP address when launching an instance in a VPC subnet. That was one of the main reasons I've been unable to create resilient NAT instances without relying on other instances. Either you use a default VPC (I never do) and always have an EIP assigned, or you have a custom VPC and no EIP assigned.

NAT instances are very important in nearly all my setups. But when one fails, the chain of events that follows always depends on other instances. That means that when a NAT instance in AZ 1a fails, the NAT instance in AZ 1b takes over by changing the route table. It also changes the route table for AZ 1a, making the formerly public subnet a private subnet which routes through AZ 1b. When the NAT instance in AZ 1a relaunches through auto scaling, it requests an EIP, attaches it, then changes the route table for AZ 1a back to a public subnet and it will stand on its own feet again. It works, but on failure, the NAT instance in AZ 1b suddenly becomes a single point of failure until someone manually restores service. I've had problems with this setup on some occasions, so I was eagerly waiting for Amazon to come up with a way to simplify this process.

A few days ago, Amazon announced that it now supports EIP management on instance launch. That means that when launching an instance through the console, you can choose whether you want to assign an EIP. They also announced that they're working to support this for the auto scaling service and that's what I've been waiting for. It means that I can relaunch my NAT instance automatically through auto scaling (with a --min-size and --max-size of 1), attach an EIP and change the route table, without relying on a second instance.

Tagged as: , No Comments

Why I don’t use custom AMIs for EC2 instances

When I started using AWS, most of the documentation on auto scaling and EC2 in general advised creating AMIs to launch a copy of your instance when, for example, scaling up. And it sounds convenient: you don't need to install packages and configure your server. You simply copy the running instance and you're done. When I started auto scaling, I quickly decided this method was not for me.

I found that every time I changed something on one of my servers, I had to create a new AMI from that instance, create a new launch config and terminate running instances that would then launch from the newly created AMI. While this works, I've discovered that using user-data files can be much cleaner than using AMIs. You can easily switch to a new Amazon AMI when it's released and a user-data file will ensure you only install packages you actually need (while an AMI can build up lots of redundant packages and files over the years).

But the most important reason for me to do this was simplifying the release of new application versions. When using an AMI, I'd have to update the application code on one instance, create the AMI, create my launch config, update the auto scaling group and terminate all other running instances. Using a launch script, I can simply push my code to the "stable" branch on git and start terminating instances one at a time. The launch script will ensure all instances have the right packages installed and pull the latest code from git. All our webservers are in a private subnet, connecting through a NAT instance, so the git repository can be setup to only allow access from our public NAT IP addresses. In fact, you can have a git repo within the private subnet that isn't password protected for this purpose.

An example launch config script for a Passenger/nginx server that would host a Rails application could be something like this:

yum update
yum install git ruby19 httpd gcc gcc-c++ ruby19-devel curl-devel openssl-devel zlib-devel
gem1.9 install passenger bundler --no-ri --no-rdoc
passenger-install-nginx-module --auto --auto-download --prefix=/opt/nginx
adduser webapp
cd /home/webapp && git clone YOUR_GIT_REPO:yourapp.git && cd webapp && bundle install
cat /home/webapp/yourapp/config/custom-configs/nginx.conf > /opt/nginx/conf/nginx.conf
cat /home/webapp/yourapp/config/custom-configs/initd-nginx > /etc/init.d/nginx
chown webapp:webapp -R /home/webapp/*
chmod 755 /home/webapp
chmod +x /etc/init.d/nginx && chkconfig nginx on && service nginx start

The last 3 lines might need a bit of explanation. I've chosen to include the nginx-configuration and an init.d script with the app. I could easily put those on something like S3, but I felt that since nginx is installed automatically with every deploy, this was just as easy. However, if you make regular changes to your nginx.conf file, you might want to do this differently.

If you combine this with a Capistrano script that would iterate through your running instances (tag your auto scaling group to easily and automatically find the right instances) and shuts them down, you have fully automated deployment in a clustered environment, without having to use your own AMIs. It's as simple as git push && capistrano deploy!