High availability, bulletproofing the cloud.


This year has seen some very high profile failures of major cloud computing providers. One of the things that stands out for me in these is an almost total inability to restore running customer workloads without major amounts of manual intervention, usually by the customer themselves.

Silly human caused outages aside, data center infrastructure is almost guaranteed to suffer outages, despite the best efforts to the contrary. physical equipment is fallible, all the way from the data center back up power generators to the fanciest server. This is magnified a lot by cloud computing, since the same infrastructure supports 10-20 times the amount of customers.

What’s giving cloud computing a bad name for reliability isn’t the failures, it’s the lousy response to them. Taking hours or even days to restore customer workloads when the cloud provider still has operating capacity (partial failure) is purely ridiculous. Expecting customers to monitor their virtuals themselves and deal with manually restarting them after a failure is guaranteed to make people even unhappier. This doesn’t even take into account the irritation at the service going down in the first place. I think there’s a WAY better way to handle this.

For quite a while now, our open source project, Xen Cloud Control System for the excellent Xen Cloud Platform, has featured the ability to automatically recover from most cloud failures. Your virtual machine stops running? XCCS automatically restarts it. A physical cloud host fails? XCCS restarts everything that was running on it on the other hosts. No muss, no fuss.

With the release of XCCS ver 0.5.4 today, we’ve introduced the ultimate automated feature to make sure the customer service stays up and running, no matter what. The new “unfriend” feature insures that two virtual servers who are unfriended will never be running on the same physical host server. This means that a partial failure of the cloud or data center infrastructure will NOT take out two redundant virtual servers. Combine this with the automatic restart facility of XCCS and the customer’s service doesn’t even sneeze at the failure.

Want the ultimate in bulletproof web servers? Take two virtual server load balancers, set up heartbeat for failover, and then unfriend them. Add two (or more) virtual web servers and unfriend them. Now you have a self healing web server that will NOT go down for anything short of a total smoking hole disaster. Completely automatic disaster recovery, courtesy of the RELIABLE cloud. This is the way cloud computing should be.

Call me at 207-399-7108 or email me today for bulletproof virtual servers that you can count on to stay up and running through anything!

Vern

swiftwater telecom rcs cloud computing logo

Leave a comment