Tuesday data center tidbits.

In the news today is the crash of the NaviSite data center in San Jose as the result of backup power failure. This is the best example I’ve seen why the move to extremely short run backup power, such as flywheel, is a totally boneheaded idea. If everything goes perfectly, short carry over may be just fine, but counting on perfection from generators is bad odds.



4 responses to “Tuesday data center tidbits.

  1. The problem here wasn’t a battery, or a flywheel, or the amount of time either of them could provide. The problem was having devices in the critical power system that could not withstand and operate in the exact conditions they are installed to protect against. The bonehead idea is using transfer switches, batteries, and static UPS systems in mission critical applications, when they cannot withstand and operate in conditions like a severe thunderstorm. It is amazing that you are bashing the products and systems that are resilient enough to work in these conditions.

    • The problem was not that the equipment wasn’t resilient enough, the problem was that the installation was improperly engineered. The storm just brought the issue to a final head.

      The truth is that the generator is one of the most fallible pieces of gear in the data center and betting the entire farm on it working perfectly every time, which is exactly what flywheels do, is what is boneheaded.

      I’d rather have enough time to work around the problem than deal with the pain of the whole works going down flat.


  2. From the limited information available, I’d agree that it sounds like it was improperly engineered, which is to my point. A properly engineered system includes redundancy AND resiliency.

    Truth is, batteries are no better than a generator if they aren’t monitored and maintained, and vice versa. If you take care of them, they work. Assuming these are critical loads, you don’t bet the farm on a single generator or a string of batteries; You need redundancy.

    But having redundant systems that can’t handle what comes through the power lines is equally foolish. That’s why people use a flywheel for a DC power source, and a rotary UPS close-coupled to a generator, to ensure that they will have power. There’s even people using this configuration in conjunction with batteries and call it “battery hardening”. Go figure!

    This might sound sarcastic, but how much time is “enough” to work around a problem? Assuming they’re both monitored and maintained, I’ll bet your batteries run out of charge before my generator runs out of fuel.

    • I typically size the strings in my DC power plant for 6 hour or more run time. It’s worth noting that telecom power plants, even with auto starting generators, are built to an 8 hour run time standard.

      I’ll take the odds of my properly maintained battery string to your generator firing up perfectly every time. Just from casual observation, I would say 75% of generator fail to starts can be resolved by on hand data center staff. Figuring a gen should start in 15 seconds or less, within 30 seconds the NOC should have the alarm. Manual start should be a matter of time to get a tech from the NOC to the gen.

      I’d be willing to bet that you could cure most gen fail to starts in less than 1 hour.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s