What is it with data center providers and an inability to keep the lights on? I’ve just been reading Power Outage at ServerBeach Facility tonight.
Apparently this outage happened as the result of a failure during “non intrusive” electrical maintenance (I have a hard time classifying maintenance that results in an outage as “non intrusive”). A failure in a single bypass switch caused an hour long power outage.
Ignoring the obvious single point of failure (one UPS, one bypass switch), this is clearly an engineering failure. Any operation of a switch or control that changes the operating state of the equipment is clearly intrusive and must be considered a possible failure point in the maintenance and examined for risk and recovery.
Recent high profile data center power failures have left me wondering about the state of data center power reliability, especially since they’ve generally been self inflicted by lack of redundancy and generally poor all around engineering.
Somebody’s playing fast and loose with things.
Vern, SwiftWater Telecom
Data center, telecom, and engineering services.