Today I was reading about the “trade off between reliability and efficiency” in the data center. I think it’s far from the truth that you have to give up one for the other.
Part of the problem that causes this kind of misconception are obsolete classification systems, such as the Uptime Institute’s tiers (I’ve written before about my problems with that particular classification system). In the example given in the article, the data center operator in question had to maintain 1 to 1 hot standby servers for every operating server to achieve that particular tier rating, as if reliability couldn’t be achieved by anything less than exact duplicates of every piece of gear in the data center. Of course, the 2N approach ignores the possibility, what if you waste all that money and energy running 1 to 1 hot standbys, the primary fails, and then the secondary immediately fails?
Of course, the Uptime Institute’s response to this is to announce yet ANOTHER data center efficiency metric.
This also spotlights the weakness of using PUE as anything but an internal engineering aid. It sounds great that you have a 1.8 PUE but, since PUE doesn’t have any reference to the amount of work being accomplished, you’ll be wasting half the energy consumed on equipment producing no useful work. The cost of operating this way plus the likely upcoming carbon penalties will melt your credit card.
So, how you combine green data center techniques with high reliability? Here’s my recipe for it:
Add a green modular DC power plant with enough modules to provide n+1 (or 2 or 3) capacity. Split AC power feeds for the modules between 2 or more AC sources.
Add 2 parallel battery strings.
Add in 1 cloud computing cluster, such as the excellent Xen Cloud Platform we use. Provision enough cloud hosts for n+1.
Split cloud hosts with multiple DC power feeds.
Add in redundant storage servers.
Add in a load balancing system capable of automatically restarting virtual machines if a cloud host fails.
Add in good site design practices.
(note, this is exactly the way we run)
The fact is that nothing short of a smoking hole disaster is likely to interrupt the service provided by this configuration for longer than the time required to automatically restart a virtual machine. If the time to restart is an issue, specify a backup virtual machine on a different host in the cloud. Protect against the possible data center wide disaster with a duplicate configuration on a cloud in a second data center.
It’s really not that hard to achieve high reliability with green data center techniques (even if Microsoft, Amazon, Google, and Rackspace make it look like it is). Deep six the antiquated thinking and your wallet and the planet will thank you for it.
High reliability services available!
Vern