Tonight’s post comes from today’s episode in troubleshooting. Sometimes I wish I knew how to use the Force.
I’m a great believer that, even with all the fancy monitoring and diagnostic things available in the data center, the best troubleshooting tools are the tech’s own senses, sight, smell, and hearing. Sometimes even the fancy tools don’t help however.
Today’s episode occurred during server work in a “micro” data center consisting of 3 cabinets equipped with a small DC power plant, router, several Ethernet switches, a NAS, and a wild mixture of servers. I was working away at the server console when I heard the distinct snap sizzle and detected the acrid odor of catastrophically failed electronics. This set in motion and immediate search for the source of the failure.
The unusual part of the failure is that every piece of equipment in the installation remained operational, normally this kind of fault produces a total equipment failure. Add to that that the smoke produced disappeared quickly in the cooling air flow and we have a challenge.
So how do we track this down? Due to time limits, I couldn’t inspect everything. I removed and inspected each module of the DC power shelf (n+1!), inspected all the power gear, the networking gear, and the critical servers. Non-critical servers were shut down until they can be opened and inspected for damage in the morning.
It’s always nice when you can spot a problem before it effects customers, so get familiar with the sights, sounds, and smells of your data center and you’ll be surprised what you can spot before the NOC does!
Vern, SwiftWater Telecom
data center, web hosting, Internet engineering