Dogfooding

11 Nov

If you’re reading this blog, you may not be thinking about the Apache server that delivers its pages, nor about the Wordpress application that we use to produce and manage the content. Nor the cloud it’s running on (GoGrid) or the constant monitoring and backups that are happening in the background. And there’s no reason you should.

It’s our job to think about these things and ensure that not just our own, but the 60+ applications we support are available at all times and no data is ever lost. We built our system with redundancy baked in so that backups are always available and restoring is a seamless, painless process.

We know this works - we test it extensively at every production release and regularly stress-test it under unusual load - but we got a chance to take the customer view a couple of weeks ago when this blog became unresponsive and created our own case study.

We received an alert that the blog was unresponsive; Apache wasn't serving pages, and launching the terminal from the manage application page would only give us a blank screen. We did as you’d expect, requesting a reboot which successfully brought the server back online. However, a few weeks worth of data seemed to be missing.

After getting over the panicky feeling that comes with the prospect of data loss, we turned to our own Standing Cloud system to launch a "Preview Restore" of the previous night's backup. Preview Restore gives you the chance to view your various backup files and select the one you want. The preview is available for 30 minutes. We used that time to verify that all of the missing data was there. We shut down the preview and picked the "Restore" button this time, and a short while later we were back up and running again with no data lost!

We can't say for sure, but we believe this happened because the virtual server the blog was running on had to be reset (by the provider) without a proper shutdown, resulting in some data not being written to the hard drive. If it weren't for our automated backups and application management, we'd probably still be trying to get all the data back.

Post new comment

The content of this field is kept private and will not be shown publicly.