Skip to content

Technical issue? You might be overthinking it

Blog
  • Once in every while, you encounter a repetitive issue that no matter what you try to do to resolve it, the problem manifests itself over and over again - sometimes, even on a daily basis. Much of how the issue is remediated really depends on the person assigned to the task.

    You might be puzzled at why I’d write about something like this, but it’s a situation I see constantly - one I like to refer to as “over thinker syndrome”. What do I mean by this ? Here’s the theory. Some people are very analytical when it comes to problem solving. Couple that with technical knowledge and you could land up with a situation where something relatively simple gets blown out of all proportion because the scenario played out in the mind is often much further from reality than you’d expect. And the technical reasoning is usually always to blame. Sometime around 2007, a colleague noticed that the Exchange Server (2003 wouldn’t you know) would suddenly reboot half way through a backup job. Rightly so, he wanted to investigate and asked me if this would be ok. Anyone with an ounce of experience knows that functional backups are critical in the event of a disaster - none more so than I - obviously, I have the go ahead. One bright spark in my team suggested a reboot of the server, which immediately prompted the response

    “…it’s rebooting itself every day, so how will that help ?”

    The investigation

    Joking aside, we’ve all heard the “have you rebooted” question touted at some point during helpdesk discussions, but this one was different. A system rebooting itself is usually symptomatic of an underlying issue somewhere, and my team member was ready for the task ahead. Stepping up to the plate, he asked if it was ok to install some monitoring software on the server. Usually, installing additional software components in a production server without testing first is a non-starter, but seeing as we needed to get this resolved as quickly as possible to reinstate the nightly backup (which incidentally hasn’t run successfully for 3 days by now), I provided approval to proceed without question. There’s a leap of faith at this point, as you could cause more problems than that you actually set out to resolve in the first place, but, as with anything related to information technology, someone’s you have to accept an element of risk. The software itself was actually for the RAID controller and motherboard  The assigned technician had already decided it was related to something along the lines of a faulty RAM module, or perhaps an issue with the controller itself. My thoughts leaned elsewhere already at this point - is the server reboots itself at exactly the same time every day then there is an established pattern which should be investigated first. It’s a logical approach, but it’s a common trait for technical support staff to sometimes think outside of the box - or in this case, outside of the building. Not wanting to push my opinion, or trample on anyone’s toes, I decided to remain quiet and see just how far this would go before intervention was required.

    In this case, not very far. The following morning after another unannounced nightly reboot, the error “the previous shutdown at [insert time and date here] was unexpected” showed up in the event log. No real surprises there, and once again, exactly the same time as the previous night. I asked my technician for an update, and he informed me that he believed that the memory was faulty and somehow causing the server to blue screen and reboot. That was actually a reasonable response and so I commended him on his research and findings, but also reminded him to perform a manual backup so that we at least had something to revert to in the event of a failure. Later that afternoon, the same tech approached me and said that he had ordered some replacement memory, and wanted to arrange downtime to fit it. Trying to keep a poker face and remain passive, I agreed and the memory was replaced the same evening around 10pm. At 2am the following morning, kaboom ! - the server rebooted itself again. Not wanting to admit defeat, our courageous tech suggested that the problem could be due to the system overheating. Another fair point, but not realistic as you’d see this in event log as a thermal shutdown. I willingly entertained this, and allowed investigations into the CPU temperature to begin - after another manual backup. Unsurprisingly, the temperature data returned no smoking gun, so that was abandoned. The next port of call was to reapply the service pack. Now, I’ll admit that this used to fix a multitude of issues under Windows NT Server (particularly Service Pack 4) but not under Windows 2003. I declined this for obvious reasons - if you reapply the service pack, you run the risk of overwriting key DLL files that could (and often will) render Exchange inoperable. Not being prepared to introduce an unprecedented risk into what was already becoming something of a showcase, I suggested that we look elsewhere.

    The exasperation

    The final (and honestly more realistic suggestion) was to enable verbose logging in Exchange. This is actually a good idea, but only if you suspect that the information store could be the issue. Given the evidence, I wasn’t convinced. If there was corruption in the store, or on any of the disks, this would show itself randomly through the day and wouldn’t wait until 2am in the morning. Not wanting to come across as condescending, I agreed, but at the same time, set a deadline to escalation. I wasn’t overly concerned about the backups as these were being completed manually each day whilst the investigations were taking place. Neither was I concerned at what could be seen at this point as wasting someone’s time when you think you may have the answer to what now seemed to be an impossible problem. This is where experience will eclipse any formal qualifications hands down. Those with university degrees may scoff at this, but those with substantially analytical thinking patterns seem to avoid logic like the plague and go off on a wild tangent looking for a dramatically technical explanation and solution to a problem when it’s much simpler than you’d expect. Hence the title of this article - Avoid the “bulldozer to find a china cup” scenario. After witnessing another pained expression on the face of my now exasperated and exhausted tech, I said “let’s get a coffee”. In agreement, he followed me to the kitchen and then asked me what I thought the problem could be. I said that if he wanted my advice, it would be to step back and look at this problem from a logical angle rather than technical. The confused look I received was priceless - the guy must have really though I’d lost the plot. After what seemed like an eternity (although in reality only a few seconds) he asked me what I meant by this. “Come with me”, I said. Finishing his coffee, he diligently followed me to the server room. Once inside, I asked him to show me the Exchange Server. Puzzled, he correctly pointed out the exact machine. I then asked him to trace the power cables and tell me where they went.

    As with most server rooms, locating and identifying cables can be a bit of a challenge after equipment has been added and removed, so this took a little longer than we expected. Eventually, the tech traced the cables back to

    …an old looking UPS that had a red light illuminated at the front like it had been a prop in a Terminator film.

    The realisation

    Suddenly, the real cause of this issue dawned on the tech like a morning sunrise over the Serengeti. The UPS that the Exchange Server was unexpectedly connected to had a faulty battery. The UPS was conducting a self test at 2am each morning, and because the bypass test failed owing to the burnt battery, the connected server lost power and started back up after the offending equipment left bypass mode and went online.

    Where is this going you might ask ?  Here’s the moral of this (particular, and many others like it) story

    • Just because a problem involves technology, it doesn’t mean that the answer has to be a complex technical one
    • Logic and common sense has a part to play in all of our lives.
    • Sometimes, it makes more sense just to step back, take a breath, and see something for what it really is before deciding to commit
    • It’s easy to allow technical expertise to cloud your judgement - don’t fall into the trap of using a sledgehammer to break an egg
    • You cannot buy experience - it’s earned, gained, and leaves an indelible mark

    Let’s hear your views. Did you ever come across a situation where no matter what you tried, nothing worked ? Did the solution turn out to be much simpler than you’d have ever thought ?

  • phenomlabundefined phenomlab referenced this topic on

Related Topics
  • Blog Setup

    Solved Customisation
    17
    8 Votes
    17 Posts
    1k Views

    Here is an update. So one of the problems is that I was coding on windows - duh right? Windows was changing one of the forward slashes into a backslash when it got to the files folder where the image was being held. So I then booted up my virtualbox instance of ubuntu server and set it up on there. And will wonders never cease - it worked. The other thing was is that there are more than one spot to grab the templates. I was grabbing the template from the widget when I should have been grabbing it from the other templates folder and grabbing the code from the actual theme for the plugin. If any of that makes sense.

    I was able to set it up so it will go to mydomain/blog and I don’t have to forward it to the user/username/blog. Now I am in the process of styling it to the way I want it to look. I wish that there was a way to use a new version of bootstrap. There are so many more new options. I suppose I could install the newer version or add the cdn in the header, but I don’t want it to cause conflicts. Bootstrap 3 is a little lacking. I believe that v2 of nodebb uses a new version of bootstrap or they have made it so you can use any framework that you want for styling. I would have to double check though.

    Thanks for your help @phenomlab! I really appreciate it. I am sure I will have more questions so never fear I won’t be going away . . . ever, hahaha.

    Thanks again!

  • Nodebb as blogging platform

    General
    10
    5 Votes
    10 Posts
    1k Views

    @qwinter I’ve extensive experience with Ghost, so let me know if you need any help.

  • 0 Votes
    1 Posts
    364 Views
    No one has replied
  • 5 Votes
    4 Posts
    1k Views

    @crazycells I guess the worst part for me was the trolling - made so much worse by the fact that the moderators allowed it to continue, insisting that the PeerLyst coming was seeing an example by allowing the community to “self moderate” - such a statement being completely ridiculous, and it wasn’t until someone else other than myself pointed out that all of this toxic activity could in fact be crawled by Google, that they decided to step in and start deleting posts.

    In fact, it reached a boiling point where the CEO herself had to step in and post an article stating their justification for “self moderation” which simply doesn’t work.

    The evidence here speaks for itself.

  • 0 Votes
    1 Posts
    321 Views
    No one has replied
  • 131 Votes
    210 Posts
    17k Views

    Some people may be open to the idea but I certainly think it’s strange to say the least 😱.

  • 0 Votes
    1 Posts
    445 Views
    No one has replied
  • 1 Votes
    1 Posts
    435 Views
    No one has replied