@crazycells this perhaps? 🙂
Ever heard of KISS ? Nope - not these guys
What I’m referring to is the acronym was reportedly coined by Kelly Johnson, lead engineer at the Lockheed Skunk Works (creators of the Lockheed U-2 and SR-71 Blackbird spy planes, among many others), which formed the basis of the relationship between the way things break, and the sophistication available to repair them. You might be puzzled at why I’d write about something like this, but it’s a situation I see constantly – one I like to refer to as “over thinker syndrome”. What do I mean by this ? Here’s the theory. Some people are very analytical when it comes to problem solving. Couple that with technical knowledge and you could land up with a situation where something relatively simple gets blown out of all proportion because the scenario played out in the mind is often much further from reality than you’d expect. And the technical reasoning is usually always to blame.
Some years ago in a previous career, a colleague noticed that the Exchange Server (2003 wouldn’t you know) would suddenly reboot half way through a backup job. Rightly so, he wanted to investigate and asked me if this would be ok. Anyone with an ounce of experience knows that functional backups are critical in the event of a disaster – none more so than I – obviously, I gave the go ahead. One bright spark in my team suggested a reboot of the server, which immediately prompted the response
“……it’s rebooting itself every day, so how will that help ?”
There’s always one, isn’t there ? The final (and honestly more realistic suggestion) was to enable verbose logging in Exchange. This is actually a good idea, but only if you suspect that the information store could be the issue. Given the evidence, I wasn’t convinced. If there was corruption in the store, or on any of the disks, this would show itself randomly through the day and wouldn’t wait until 2am in the morning. Not wanting to come across as condescending, I agreed, but at the same time, set a deadline to escalation. I wasn’t overly concerned about the backups as these were being completed manually each day whilst the investigations were taking place. Neither was I concerned at what could be seen at this point as wasting someone’s time when you think you may have the answer to what now seemed to be an impossible problem. This is where experience will eclipse any formal qualifications hands down. Those with university degrees may scoff at this, but those with substantially analytical thinking patterns seem to avoid logic like the plague and go off on a wild tangent looking for a dramatically technical explanation and solution to a problem when it’s much simpler than you’d expect.
After witnessing the pained expression on the face of my now exasperated and exhausted tech, I said “let’s get a coffee”. In agreement, he followed me to the kitchen and then asked me what I thought the problem could be. I said that if he wanted my advice, it would be to step back and look at this problem from a logical angle rather than technical. The confused look I received was priceless – the guy must have really though I’d lost the plot. After what seemed like an eternity (although in reality only a few seconds) he asked me what I meant by this. “Come with me”, I said. Finishing his coffee, he diligently followed me to the server room. Once inside, I asked him to show me the Exchange Server. Puzzled, he correctly pointed out the exact machine. I then asked him to trace the power cables and tell me where they went.
As with most server rooms, locating and identifying cables can be a bit of a challenge after equipment has been added and removed, so this took a little longer than we expected. Eventually, the tech traced the cables back to
………an old looking UPS that had a red light illuminated at the front like it had been a prop in a Terminator film.
Suddenly, the real cause of this issue dawned on the tech like a morning sunrise over the Serengeti. The UPS that the Exchange Server was unexpectedly connected to had a faulty battery. The UPS was conducting a self test at 2am each morning, and because the bypass test failed owing to the burnt battery, the connected server lost power and started back up after the offending equipment left bypass mode and went online.
Where is this going you might ask ? Here’s the moral of this particular story