Back In April, Dave Jilk, our CEO, wrote a blog post about the intelligent, reliable, consistently-going-above-and-beyond-the-call-of-duty team that we have here at Standing Cloud. I started working here just over a year ago, and I’m also repeatedly impressed with our team and the quality of our service. Monday, when Amazon Web Services’ EC2 servers went down on the East Coast, I had one more reason to be proud.
Before I get into the details from Monday, I want to talk about our company’s ideals. Many months ago our management team set out to define our values. Now, we use the in-office social network Yammer, to keep that list at the forefront of our minds by posting about anyone who particularly exemplifies those values. Number one on the list (and I honestly don’t know if these are in a particular order, but I like that this one is at the top) reads: “We are intensely focused on our customers and users, and on their success.” I love that, and I’m going to forgo the standard Yammer message (hashtag values) and blog about how well I think our team kept our customers’ success in mind with the AWS collapse on Monday.
Amazon is one of the many clouds we support, meaning users can spin up software on EC2 through our system - if Amazon is up and running that is... When the Amazon outage hit on Monday, we found out about it fairly quickly and went to work figuring out its impact on the usability of our system, and to see if our users were being affected.
Our first move was to test whether we ourselves could get software installed on an Amazon server. Not surprisingly, that didn’t work. When we tried again it did spin up; however, with all of the problems seeming to snowball, we decided to disable Amazon until they reported that everything was running smoothly.
The Standing Cloud system has a simple but incredibly useful feature (my favorite feature) called auto-restore. We routinely back-up each user’s database and software on a different cloud from the one they are hosting on, so if the user’s chosen server, or even cloud (gasp! not the cloud!), goes down, and the user has autorestore enabled, our system will detect the outage and automatically migrate everything over to a working server almost immediately.
So, after disabling Amazon for new installations in our console, we moved on to our Amazon users who did not have the auto-restore feature enabled. Even though those users would be able to manually restore on their own, our support team wanted to preemptively check in. Luckily all of their applications were available and running fine, and a quick check revealed that all of our Amazon users with auto-restore enabled were working too.
It’s great that no one using Amazon on Standing Cloud was hurt by the outage (since we don’t currently deploy our EC2 servers with EBS volumes the outage problems didn’t affect our system) but what really makes me happy about Monday was how our support and dev teams handled the situation. The conversation didn’t go: “Hey did you hear Amazon is down on the East Coast? Looks like I won’t be logging into Pinterest anytime soon!” Instead it was more like: “Hey Amazon is down, our users are probably fine, but let’s check in on their installations just to make sure.”
Definitely a great example of our team exemplifying our values and that whole “above and beyond” thing I was talking about before. And I think it’s what makes Standing Cloud an awesome team to be a part of.
Oh, and also, all those sites that went down? (Especially the little guys without big names like Reddit & Foursquare.) If they’d been using Standing Cloud to host their stuff they’d have been back up in minutes. Just sayin...autorestore & cross-cloud backups ftw!