Monitoring Headaches?

Monitoring Headaches?

When infrastructure monitoring works, the business doesn't often notice, monitoring just works, the monitoring process works…

But when faults arise and it affects the business, then all hell can break loose…

So many companies I have discovered over the years don’t have a clear view on what they monitor in their IT estate, how they monitor it (IT?) and if indeed they monitor everything they should be monitoring in the first place…

This can create a headache… just one of those niggling little aches we all ignore from time to time and hope they will just go away...

Or they might have multiple tools, multiple teams, multiple managers - all putting their multiple opinions and requirements into the monitoring melting pot…

And this can create a real headache that won’t go away without intervention

The pain can be alleviated though of course… with the right diagnosis… and the right treatment...

And it’s not always that the monitoring system itself that’s to blame...

Common symptoms I often see include:

  • No alerts were received when the fault occurred...
  • Alerts were received and weren’t actioned properly...
  • Alerts were simply ignored...
  • Maybe there were too many alerts and operators were overwhelmed - they "couldn’t see the wood for the trees”…
  • Maybe they just didn’t know what to do with the alerts...
  • The tools couldn’t monitor what was required effectively...
  • Maybe nobody asked for that fault conditions to be monitored in the 1st place...
  • Maybe the processes & procedures failed...

Sometimes it can be 1 or 2 of these symptoms, often it can be a lot, or even all of these issues…

And that can cause a real migraine

Many organisations of course evolve, having multiple disparate or overlapping systems, and only a few I have spoken to actually realise the problem is not the tools themselves, it’s often themselves as a monitoring function that are the true cause of the problems they are facing…

Blaming 1 particular tool and simply “buying more” doesn’t fix the underlying issues

Some organisations add to their toolsets and then don’t (or can’t or won’t) consolidate the remaining tools, leaving them often with a bigger headache than before...

It’s like putting a band-aid on when you have a splitting headache

Diagnosis is critical

Correct treatment is essential

Solving the underlying issues and establishing firm requirements is often what is required here:

  • Assess the current tools, monitoring, people, processes & procedures…
  • Identify quick-wins to relieve the immediate pressure…
  • Identify monitoring gaps and plug the gaps where possible as fast as possible…
  • Define the monitoring requirements...
  • Determine if the current environment can meet the requirements… and plan accordingly...
  • Refine the current monitoring environment to meet the requirements and re-identify any gaps...
  • Plan a consolidation and/or migration strategy if required...
  • Ensure monitoring is defined, implemented, documented, supported and trusted by all…
  • Ensure tools, people, processes and procedures support the monitoring function at all times..
  • Re-assess continually

No alt text provided for this image

“Oh but we’re all in the cloud now!” I sometimes hear...

This is another issue I have seen - a reliance on cloud or cloud-based tools…

Just because it’s “in the cloud” doesn’t mean it can’t fail

Just because a cloud provider might keep your server running in the cloud doesn’t mean it’s monitored how you need it monitored…

Even with monitoring capabilities supplied by a cloud provider, this might not be sufficient for the needs of your business...

You might have additional requirements, additional “things" to monitor…

Things that have been missed by vendors, bespoke requirements, additional monitoring requirements you would like to prevent outages, prevent chaos, avoid headaches…

No alt text provided for this image

I’ve also never been a fan of “top-down” views in isolation… something green on a dashboard is useless unless it’s meaning is understood and complete...

I believe a 2-way approach is needed - top-down and bottom-up

As I remember saying to a CTO years ago (back in 2006)…

“There’s no point in having a nice shiny dashboard with green traffic lights on it if those indicators are not a true and complete reflection of all underlying connected systems”

This is still true today, some 16 years later

Monitoring must be in place, complete, tested, and trusted 100% by all

Yes - 100%

If it’s not, then it’s not a matter of IF you will get a headache, but WHEN

Rip off the band aid and diagnose the root of the problem now, not when things fail

Avoiding pain is far easier than pain management in the long-run, and will save you valuable time, effort and money.

#protocol #itmonitoring #itinfrastructuremonitoring

Thank you for reading this. If you enjoyed it please click?LIKE?and click?SHARE?to share it with your network.

No alt text provided for this image

About the Author:

David Gerrish has been a successful IT Contractor since 1996. He has worked throughout the UK & Europe, and has contracted further afield in countries such as Hong Kong, Singapore and Australia.?He has worked with numerous blue-chip clients including Barclays, The London Stock Exchange, Hewlett Packard, Fidelity, Bupa, Cazenove and many more.

Dave is available now for contract roles and short-term monitoring consultancy.?Hire him for 1 day, 5 days, a week per month, or more...

If you would like to have a chat with Dave please email him at [email protected] to arrange a call.?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了