I knew it was going to happen just the moment I read about Intuit face planting their data center and web sites for 36 hours. The anti cloud computing crowd are out in force with their mantra that this “proves” that cloud computing is unreliable. What is does prove is that if people can’t come up with a good argument against something, a silly one will do in a pinch.
So, what exactly happened with Intuit? We do know that a power failure as the result of “routine maintenance” took down both primary and backup servers. I havn’t seen a detailed analysis of it yet, but a little deductive reasoning will reveal the likely chain of events.
Unless the data center power design is totally nuts, any power failure that takes out both primary and secondary systems would have to be in the high voltage primary power coming into the data center (ala the catastrophic power failure at The Planet’s data center in 2009). “Occurred during routine maintenance” is a code phrase that roughly translates as “We were screwing around inside of live power equipment doing something we didn’t really need to be doing and someone messed up”. This has been the cause of many data center power failure events over the last year.
Looking at the history of these events, it’s easy to see this has no relationship whatsoever to cloud computing, nor does it reveal any inherent weakness in cloud computing. So, just what does this outage show?
First, the folly of putting all your critical services in one data center.
Second, that it takes a total “smoking hole” disaster to disrupt cloud computing (showing up the lie that cloud is less reliable than a dedicated server).
Third, that Intuit (and other cloud providers) don’t understand that the consequences of failure in a cloud are far higher than the equivalent failure of a dedicated server and their infrastructure has to be designed for that (failure of a single cloud host will take out 10x or more the service that failure of a single dedicated server will).
Fourth, that Intuit (and other cloud providers) don’t take advantage of the features of clouds to automatically restore downed services. Have a hardware failure in our cloud and, as long as any of the cloud is still running, virtual servers will be restored and running in 15 minutes or less.
Fifth, that Intuit (and other cloud providers) fail to correctly assess the risk of doing “routine maintenance” on live data center power equipment.
So, what does this leave us with? Understand that a single cloud server is far more important than a single dedicated server, segment power so that no one failure will kill everything, run backup services in a separate data center, automate cloud disaster recovery, and stop monkeying around inside of live power equipment.
Would you look at that, it isn’t cloud computing’s fault after all.
Email or call me or visit the SwiftWater Telecom web site for cloud computing services.