Tonight’s post comes courtesy of the article Only fools do N+1. I’m not a fool but I’ll examine where this concept goes wrong.
First of all, just what is N? N is the amount of equipment required to support operation. So what is N+1? It’s a formula for redundancy that calls for the amount of equipment that is needed to support operation plus one spare. This allows the failure of one unit without disrupting service.
The source article claims that anything less than 2N, completely separate duplicate sets of equipment, is acceptable. This kind of redundancy is great for a nuclear power plant or a hospital but massive overkill and waste of money for the rest of us.
The first myth is that a failure in N+1 typically cascades into more than one failure. By definition, this doesn’t happen, since N is enough to handle the load by itself. The correct decision here is based on the probability of another failure versus the time required to repair. If there is a long repair time and the potential of another failure, go with N+2 or more to insure coverage.
The second argument is that an N+1 system has at least one single point of failure in it. That may be true in some configurations, but consider the single point and how likely it is to fail.
As an example, I use a DC power plant in the data center. This is a modular unit so I have one extra module plugged in for redundancy. Is there a single thing in this unit that could interrupt service? The buss bars. I consider to odds of a buss bar failing to be too small to justify the massive expense of a complete second system. No other failure in the installation will interrupt service, even a failure of the single AC power feed.
And I did it without wasting a ton of cash on a full duplicate system.
Vern, SwiftWater Telecom
Data center, web hosting, Internet engineering