Tag Archives: Google

Tuesday data center tidbits: got one! and money for nothing.


First up today is the piece about Google having to bury their data center fiber because hunters shoot it off the poles. Ah yes, beered up yahoos running loose in the woods with firearms, another criteria for choosing a data center location.

Next up is the piece where Gartner claims cloud brokering is the single largest revenue opportunity in cloud computing. If being a dubious value added middleman is the way to make the most money, then cloud computing is a big fat #FAIL.

Email or call me or visit the SwiftWater Telecom web site for green data center services today.

Vern

swiftwater telecom rcs cloud computing logo

Friday data center tidbits:shelling the servers, Google does one right.


First up today is the piece about the polycarbonate shield and plastic curtain for protecting data center cabinets from overhead leaks and debris. Not a bad idea at first glance but I think this thing would impact airflow from and to the cabinets horrendously.

Second up is the piece about Google adding capacity for the launch of their new “Instant” feature. Kudos to Google for not only providing a good example by focusing on extreme efficiency in the way they use their existing data center capacity, but also managing to roll out a substantial upgrade without blowing it up (hey Digg, are you watching?).

Email or call me or visit the SwiftWater Telecom web site for green data center services today.

Vern

swiftwater telecom rcs cloud computing logo

Friday data center tidbits: Intuit data center face plants, Google patents stacking, and more!


First up is the piece about the 36 hour failure of Intuit’s data center as the result of a power failure cause by “routine maintenance”. What is it with data centers that they can’t resist screwing with critical power facilities in the name of “routine maintenance”? This has been an ongoing theme in major data center outages for the last several years. Really, if your primary operating power system requires true “maintenance” (and not just BS things like measuring phase rotation in live panels just to check), then you should reconsider your design. Squeaky red noses to Intuit,

News of the ridiculous: Is it REALLY a patentable idea to stack data center containers?

Finally, there’s the next big data center money giveaway, to support Microsoft in Iowa. It must be nice to be that rich and still have state governments shower you with public money.

Email or call me or visit the SwiftWater Telecom web site for cloud computing services.

Vern

swiftwater telecom rcs cloud computing logo

Keeping the cloud flying. #cloudcomputing


I was just reading an article by David Linthicum about combating cloud outages. While not really wrong, I think it misses the point about what it takes to really keep a cloud up and flying.

The “core issue” with cloud computing failures is NOT overcapacity. A quick look at major failures over this year and last shows everything from human screwups (Google Gmail failure from botched maintenance work in 2009) to endless lists of power related problems (4 failures of Amazon’s EC2 service in one week) to, yes, over capacity issues (Google AppEngine, repeated Twitter failures).

The human caused cloud failures have been especially confounding. Failures of untested software, power equipment installed without even a bother to check the configuration (one of the recent Amazon EC2 failures), the list of incompetent engineering and operation incidents is astonishing.

So what is the real core issue with cloud computing failures? Aside from the obvious screw ups and foul ups, the real issue is the magnifying effect of the cloud. The increased “population density” on the same hardware magnifies the effect of any failure.

Power fail one dedicated server back in the days P.C. (pre cloud) and you took out one customer. Power fail a single cloud server and now you’ve knocked out 10 (or far more) customers. The failure modes aren’t significantly different in a cloud, the magnitude of the effect is.

So what is the solution?

1. Meticulous attention to detail in constructing, engineering, and operating the cloud. Take the human goofs out of the equation.

2. Never ever ever load any software into the cloud itself that hasn’t been tested thoroughly. This should be obvious but for some reason it isn’t (this is why we operate a “stunt cloud” in addition to the production cloud).

3. Segment the cloud infrastructure (power). No attention to detail is ever going to be perfect so minimize the amount of the cloud a failure can take out.

4. Automate, automate, automate. Rebalance workloads to compensate for down infrastructure and detach down hosts and restart their workloads on running hosts, automatically.

On our cloud computing service, anything short of a smoking hole disaster automatically starts restoring workloads in 5 minutes with an absolute maximum of 15 minutes to all restored. Compare this to the 7+ hours restore times for Amazon EC2 outages.

Notice I didn’t say anything about capacity here. Adding capacity to a cloud is one of the easiest and fastest things to do (we go from bare server to loaded and operating in the cloud in 10 minutes or less).

The real key to keeping the cloud flying is to minimize the goofs, limit the effect of the goofs, and automate a lightning fast response to the goofs that make it by despite the best intentions.

Cloud failures happen, it’s the response to them that makes the difference.

Email or call me or visit the SwiftWater Telecom web site for cloud computing services.

Vern

swiftwater telecom rcs cloud computing logo

Wednesday data center tidbits.


First up today is ponderings about whether it would be better to reuse existing suitable buildings for data centers rather than keep duplicating buildings just for the sake of being new. I’ve written quite a bit here on the blog about going as green as possible by taking advantage of the embodied carbon in existing buildings (millions of square feet of former mill space here in the East) and the unique features of many of these buildings that make them a great choice for data center use (incredibly strong, huge amounts of power, ideal for free air cooling). You want to be really green, use what’s here already rather cut down a forest to build yet another building.

Next up is the story about Google not claiming the tax incentives for its South Carolina data centers. Just another example of throwing gifts at huge corporations who don’t need them only to get nothing out of it. I know I don’t have the celebrity status of Google, but I guarantee I could create a pile of good data center jobs for the local community with just fraction of the Google goodie bags (hey Maine politicians and economic development folks, you listening?).

Email or call me or visit the SwiftWater Telecom web site for cloud computing services and green data center services.

Vern

swiftwater telecom rcs cloud computing logo

How not to handle a data center power outage, starring Google.


Today’s post comes from the post mortem of the Feb 24 data center power outage that took down Google’s AppEngine cloud computing service. The issue I see here is not the power outage itself or the fact that it wiped out such a large chunk of Google’s service (although that is disturbing) but the response to it.

I don’t know what the issue is that many of the large data center operators can’t seem to hold their power infrastructure together. With the condensing of services brought about by cloud computing, the same size power outages today have the ability to impact far more services and customers than they used to. The flip side to engineering the power system for maximum reliability and segmenting the power so that individual failures only impact small parts of the service is having an efficient and sensible response to having something unforeseen happen.

The first problem with the response was that Google didn’t have anyone present who understood the system enough to freestyle the recovery of it. When it comes right down to it, there’s no substitute for having someone who knows what’s really going on inside of things.

Having ordinary data center staff handle this kind of failure wouldn’t be a disaster presuming that there was correct procedural documentation and the failure actually fit the procedures that were written. The problem with this approach is that things may not obligingly fail in the way that you’re prepared for. Of course, throw in a bunch of official procedures that havn’t been properly updated and now you have a real recipe for chaos.

The final point to this is judgment. The staff has to recognize that, if they’re not qualified to freestyle a fix to the problem, they’re probably not qualified to debug a written procedure that doesn’t work. There shouldn’t be any debate or question about it. If the failure is impacting customers and the written procedure doesn’t apply or doesn’t work, fail over to backup, immediately.

One of the worst things that you can do is engage in panic mode engineering (especially on something you’re not qualified for) while customers are down and you have an operational backup facility available. The priority is not the debug the original problems, the priority is to get the customers back in service. When the customers are happy again, then you have breathing space to straighten out the original snafu.

Make everything as solid and robust as you can, prepare for the worst just in case, but don’t lose sight that the real priority is the customer.

Check out our new cloud computing products today!

Vern

swiftwater telecom rcs cloud computing logo

Thursday data center tidbits.


First up is the story about Google warehouse scale computing patterns. The idea that you’re going to design a data center for future applications is ridiculous when you have no idea what those applications are going to be. The key to this is a data center that’s flexible to the max and can change on a dime when requirements change, Build a data center for 5 year lifespan and you’re stuck with a forklift upgrade or abandon it and start over.

Second up is the salesforce.com cloud computing outage this morning. It’s not the cloud failure (it seems like all the major cloud have to play “me too” with those), it’s the idea that their stuff was in one basket enough to blow it all out of the water.

Get a 30 day free trial of the SwiftWater Telecom cloud powered Total Business Server for small to medium business today!

Vern

swiftwater telecom rcs cloud computing logo