Friday data center tidbits.

December 18, 2009

The buzz this morning is about Twitter being “hacked”. Far from the actual site being hacked, this appears to have been a simple case of DNS cache poisoning, a well known vulnerability of older versions of DNS server software. The disturbing thing about this is that Twitter isn’t paying attention to updating software for widely known holes like this, not what you’d expect from a major IT company. What else isn’t being kept up to date?

Research In Motion decided they needed to get in on the action before the end of the year and blow up BlackBerry email during another maintenance gone awry. With the number of major outages on the net this year resulting from maintenance activities, I think most of these big companies need a refresher course in how to plan a maintenance without screwing it up.

12 Days of Data Center Christmas, 25GB of remote backup storage for just $20!

Vern, SwiftWater Telecom


Wednesday data center tidbits .

December 17, 2009

First up today is about a data center in Phoenix covering its roof with solar panels. I love to see these solutions that give two benefits. Not only do you get the power but you also get the shading of the roof by the panels. A large flat roof is one of the worst offenders for absorbing heat. Shade the roof, reduce the heat infiltration, reduce the cooling need of the data center.

Second up is an article about optimizing servers for performance per watt. I agree that a rdiculously unbalanced server (CPU, memory, storage performance) will result in a seriously underperforming server for the amount of power consumed. I do disagree that the primary reason for idle CPUs in the data center is this imbalance. The primary reason for underutilized CPUs are dedicated servers running work loads of only 5-10%. The answer to that of course is virtualization.

Vern, SwiftWater Telecom


Xen Cloud Platform command of the week.

December 16, 2009

This post is my once a week effort to detail useful commands in the Xen Cloud Platform virtual cloud computing system that aren’t included in the manuals.

Today’s command is xe host-evacuate. The host-evacuate command has the form xe host-evacute uuid=. As the name suggests, host-evacuate migrates all the virtual machines off an XCP host.

This command could be used to take a host down for service or to migrate VMs before a hosts backup power expires. I use it as part of the procedure to automatically condense the cloud during low usage periods and shut down unneeded hosts.

12 days of data center Christmas #12, cloud powered virtual machines

12 days of data center Christmas #11, DC power engineering specials

Vern, SwiftWater Telecom


Tuesday data center tidbits.

December 16, 2009

Today we have the story about Rock You getting 32 million clear text passwords and email addresses stolen. Congrats guys, you win bozos of the month award. And you wonder why IT in general keeps having credibility problems.

The 12 Days of Data Center Christmas, cloud virtual servers!

Vern, SwiftWater Telecom


The virtual data center cloud and predictions of doom.

December 14, 2009

This afternoon, I had a link forwarded to me about Mark Anderson predicting dire doom and catastrophe for “the cloud”. I’m not sure I know where to start on this one.

First of all, one of the big red flags that someone may not have a clue what they’re talking about is referring to “the cloud”. Microsoft has a cloud. Amazon has a cloud. SwiftWater Telecom has a cloud (that’s us :) ). There is no huge Internet wide entity called “the cloud”. Service providers clouds are all independent and not subject to impact from one providers problems.

Second, just what is the nature of this major catastrophe that’s supposed to kill everything? Will all the wires in the Internet melt? It’s easy to make grandiose dire predictions when you don’t have to be very specific (many of the best known apocalyptic prophets of history have been similarly vague on the details as well).

Most of the failures of the cloud thus far fall into two categories, infrastructure engineering errors and bozo human operational gaffes. Coincidentally, these are the same things that whack all data center services from time to time. The cloud isn’t any more susceptible to this than traditional services. Ignore long standing good system administration rules, such as appropriate data backups, and you’re playing Russian roulette, exactly as you would be with a colocated server.

As much as I’ve snapped on Microsoft and Amazon for botched engineering, operations, and customer relations in their recent cloud outages, these serve to demonstrate just how durable cloud services really are, since they only affected a very tiny portion of the providers cloud. Kill one of our cloud hosts and the rest automatically take up the slack.

As the engineer and architect of our cloud, I can tell you it would take an external disaster of epic proportions to knock any significant portion of our cloud off line for an appreciable time. Assuming a “smoking hole” disaster in data center one and the loss of cloud one, simply connect to data center two, restart all the back up copies of the virtuals on cloud two, and we’re off and running again.

Even better, if you can see it coming, such as a severe weather event, migrate the VMs across the net to a different data center out of the path of danger. Try to do that with a colocated server.

Finally, there’s the “security breach” disaster. I have news for Mr Anderson. Large numbers of security breaches happen every day on traditional services. A cloud (you’ll notice I didn’t say “the cloud”) isn’t any more susceptible to this than any other data center service. Well established good system and network administration practices and prompt updating of software are still the best defenses against security breaches.

In short, if someone wants to provide me with specific details supporting the prediction that my, or any other providers, cloud will fail catastrophically next year, I’ll be happy to listen. Meanwhile, Mr Anderson should seek a refund on the goat entrails he used for this “prediction”, I think they were past their expired date.

Vern, SwiftWater Telecom

SwiftWater virtual cloud service


Data center questions and answers.

December 11, 2009

I’ve culled these data center related questions from search engine traffic to the blog this week.

The first question is about 277VAC power. 277VAC is the low voltage component of 480Y277 (hot to neutral). 277 is used to provide lighting in industrial plants that generally use 480 3 phase power. Since most data center equipment universal power supplies top out at 250VAC, 277 really isn’t useful for anything except lighting.

The second question is about using 12VDC as DC power in the data center, vs 48VDC. 12VDC would require substantially more current to provide the same energy as 48VDC, meaning larger power plant, larger batteries, and larger conductors. Add to this that 12VDC is a non-standard voltage for this use and little to no data center equipment is going to support it. Unless you’re Facebook or Google and you can afford to have custom equipment made, stick to 48VDC.

The next question is what the best DC power plant batteries are. Rather than answer that, I’ll run down the criteria. Data center DC power plant batteries should be sealed (to prevent off gassing of flammable hydrogen), constructed to withstand possible deep discharge cycles without damage, and low sulfur content (to extend the battery life and prevent premature failure from sulfation). I usually use common sealed deep discharge 120AH batteries in the strings.

The final question is server room separation distance from the electrical equipment. This is a timely question with recent catastrophic electrical failures in data centers. The only thing in data center electrical equipment that represents a significant threat is an oil filled transformer and those should NEVER be placed inside the facility due to the risk of sudden catastrophic failure. In general, the server equipment should be as close as possible to the power equipment to reduce power loss in the cables, especially if you’re running a 48VDC power plant to supply your servers.

Vern, SwiftWater Telecom

data center cloud computing


The data center, the cloud, and power failure.

December 11, 2009

Today we got the story about Amazon’s outage of part of its EC2 cloud. In this post, I’ll examine what happened and what you can do to avoid the same fate.

Data center power distribution is even more critical today due to the spread of the data center cloud. Due to the condensing effect of the cloud, failure of a poorly designed power system can wipe out far more customer service than ever before.

The failure of the EC2 section was caused by the failure of one PDU, then the failue of the backup while the first one was being repaired. Two almost simultaneous failures of PDUs makes me wonder if Amazon is buying them from a shady guy in a back alley (hey mister, ya vant to buy a PDU?).

The problem here was that the reliability of the PDUs wasn’t properly characterized. In order to make the decision of n+x (n+1, n+2), you have to determine what the chances of multiple failures leaving you without enough capacity to operate are. You need enough redundancy to be sure you can repair a failure before having a second one. The sad part of this is that this is just another PR black eye for cloud reliabilty (especially when you add in 5 hrs to restore customer workload).

Amazon swings, and they whiff.

Vern, SwiftWater Telecom


The data center silo and missed opportunities.

December 11, 2009

Tonight I’ve been reading about the CLUMEQ data center silo in Quebec. Great reuse of an existing facility but one less than efficient oddity.

I’ve talked a bit in previous posts about the idea of embodied carbon. Embodied carbon is the representation of the carbon released by the manufacture of an item and represented by the item. It’s hard to imagine a much bigger source of embodied carbon than an existing building. The reuse of a building means that there isn’t any further release of carbon as the result of replacing the building.

The challenge of reusing a building for a data center is when the building was designed for a specific limited purpose as this one was. Odd shapes, odd layouts, these all can create a nightmare of logistics and airflow.

The reuse of this odd silo is nearly brilliant. Using circular placement of the equipment to take advantage of the open center of the building as an air plenum is a perfect example of using the oddities in your favor.

So where is the wart on the nose of an otherwise great looking facility? It’s simple. Cold air naturally sinks and hot air naturally rises. Since they chose to place the cooling systems in the basement, they have to move cold air up from the basement and hot air back down. This means that they are using energy to pump air against it’s normal movement, instead of using the energy from the waste heat to move air without adding any extra energy. A perfect bad example of a wasted green data center opportunity.

Valiant attempt, glaring miss.

Vern, SwiftWater Telecom


Thursday data center tidbits.

December 10, 2009

I see that parts of Amazon’s EC2 data center cloud were down with power failures yesterday. I’ve written quite a bit on how not to allow power problems to wipe out your data center or cloud (apparently nobody is listening) but, 5 hours to restore customer virtuals? Yeek!

The word for the day is “greenwashing”. Does your green data center walk the walk?

Vern, SwiftWater Telecom

data center virtual cloud computing


The reliable data center cloud.

December 10, 2009

Todays post comes from reading an article about cloud computing and data center efficiency. I’m going to talk about one of the issues touched on, reliability.

The cloud implementations of some of the early major players have had large and public failures, in some cases, repeated. The distributed nature of the data center cloud lends itself to a number of fairly simple steps to insure reliability.

So, how do we help prevent data center cloud burps? The first thing is to segment the infrastructure the cloud depends on. Split power feeds between fairly small segments of the cloud to reduce the impact of power problems.

Segment the cloud network and provide redundant network paths, especially in the storage network. The storage network is the heart of the cloud and it has to be as rock solid and fault tolerant as possible.

Cooling is another area to segment. Is there a pattern showing here? The secret to data center cloud reliability is to segment so that any infrastructure failure impacts as small an amount of the cloud as possible. Minimizing the impact allows the cloud to be restored to full operation in case of failure. Operating degraded is also far better than not operating at all!

The other advantage of segmenting is to reduce the possibility of cascading failures. Since the cloud is so interconnected, any failure that ripples through the cloud would be catastrophic. Now you’ve gone from an almost unnoticeable impact to major impact and extensive restore time.

The nature of the data center cloud not only increase energy efficiency but also “should” benefit reliability. Individual failures will still happen but the overall effect on the cloud will be limited.

So, why the high profile cloud failures? Lack of storage redudancy, inviting cascading failures (multiple Google outages), inviting human error into things like power without segmentation (short a PDU and half the data center chokes).

Sometimes, the best even the big boys can do is serve as a bad example.

Vern, SwiftWater Telecom