What I have learned about Observability? If anything at all..
Thanks to Celeste Garry my budding photographer for this photo..

What I have learned about Observability? If anything at all..

It has now been 20 years since I worked in Motorsport and started my journey with Observability. Wow! Time has gone fast.. And as much as I don’t want to admit it, my old pit crew tops are fitting a bit too snug nowadays..

When I reflect back on my experiences then, some of the challenges back then still ring true today. And one paradox around observability I still feel applies today - “Simple effective observability can be really difficult to implement”. Now that statement might elicit a response from anyone reading this, but for me it’s the phase “effective” that makes all the difference. Effective observability, in the context of a modern IT solution, I believe should mean that we can easily understand the state of the solution and the business impact.

I have been very fortunate over the past few years to work with a variety of great organisations that have been on their own journey of implementing their observability solutions.?After working with organisations that range from large enterprises to start-ups, the common pitfalls I have seen in designing observability solutions are that sometimes we focus on the wrong areas and we completely skip adding business context.

Be like Rick

Rick is a legend. He is working with an airline and has proactively created a new view that highlights to the business the dollar value loss in revenue when the ticketing system has a degradation in performance. The customer absolutely loves Rick and what he has created. What a legend”.

This was a conversation I overheard recently about a colleague that had been working with a customer in the Transportation industry. It really highlighted a few things to me. The first being that the phrase legend is used by antipodeans a lot. An awful lot.?But more importantly it highlighted that Rick had been promoted to hero like status by talking to his customer and listening to what was important to their business and seeking to understand.?

With the advent of OpenTelemetry it’s really trivial to quickly get up and running and collect Logs, Metrics and Traces. With these inputs we can very rapidly tell if a system is available and to start understand it's state. But as they said in the Wild West “There be gold in them thar hills..”

To achieve this we simply need to be like Rick and go that extra step.?We need to ensure with our observability solutions we have worked to capture business context from the right sources. To understand what those right sources are we should seek to understand what they might be from business partners, product etc.

Focusing on the right areas

In Formula 1 there is a concept quite widely used call “Burst Logging”. It’s quite simple in that specific metrics that are normally captured at a set frequency, get captured at a much higher resolution for a short period of time when the car is experiencing a considerable change. Those changes are generally when the car is going from one gear to the next and not when it’s sitting at 19000rpm at 350 km/h. When I first encountered this I thought to myself “Why not just log at a higher frequency all the time?” And the answer is quite simple - it’s a very expensive exercise from a resource perspective so aggressive logging only occurs when necessary.

When I think about this in the context of how we approach observability, I feel that there’s some good synergies that we can apply. A common problem I have heard in the past is that parts of the observability solution are consuming more resources than the system it’s applied to. With legacy Application Performance Management (APM) agents, this was a very common problem that generally resulted in the agents being disabled when the overhead was not acceptable. Completely disabling all logging and monitoring is clearly not the answer here.

In my opinion part of the solution here is to design your observability solutions so that it’s laser focused on the areas that really matter. This strategy means that you could apply a low level of basic logging across the broader solution, and then really focus on the areas that really matter with logs, metrics and traces.?

Getting all of this right can be tricky. And it can require multiple revisions in your observability solution. But if you can get this right then there are?significant rewards.

Summary

So what does this all mean? Well in summary it really helps to take time to understand the system you are trying to observe and to focus on what matters. And always be like Rick.

Paris Georgallis

Technology Leader | Chief Customer Officer

1 年

Its all about the effective value being delivered. Degraded performance of a business application and its direct impact to revenue or loss of revenue.

回复
Frazer Parkinson

Passionate evangelist, critical thinker. Senior Enterprise Account Executive

2 年

Love this

回复
Rick Jury

Principal Technical Account Engineer, Customer Success Team at Sumo Logic

2 年

nice article!

Richard Hackett

Obsessed by Customer Success

2 年

Great post. Thanks for sharing my friend.

要查看或添加评论,请登录

David Garry的更多文章

  • Best Practices for Monitoring using APM

    Best Practices for Monitoring using APM

    Frequently I have thought of publishing an article that discussed some best practices with respect to possibly the most…

    5 条评论

社区洞察

其他会员也浏览了