Are my systems "Observable"??
Image Credit embrace.io

Are my systems "Observable"?

So people ask/wonder often, we have so much ( too much ? ) alerting and monitoring. Are my systems what they call "Observable" ?.

Maybe..maybe not. Let's explore.

Monitoring: It is the gathering of surface-level data points ( in legacy systems, monitoring may mostly be events-based alerting ). In very simple cases some of these isolated data points/alerts can tell you the cause of your system failure e.g. Hardware off-line or Database Crashed.


Visibility: It is understanding the various components in your system in isolation. So visibility of your servers, visibility of your networks, of your market data, of your distributed devices.


Observability: To understand the internal state of a system from its surface-level information i.e. from the data it's spewing out.

It's putting the above two ( monitoring and visibility ) together and contextualizing it by adding more layers to it.

A holistic view of the entire system and/or ecosystem. It contains ( but is not limited to ) logs, traces (especially on distributed systems), metrics, Machine learning.

No alt text provided for this image

( Image credit, OpsRamp).

Centralized Monitoring of the right data points, of all your devices and environments, is the foundation piece of observability.

Juxtaposing algorithmic real-time log analysis with centralized monitoring, visibility of the entire ecosystem, and tracing of distributed systems will go a long way in providing observability in our systems.?

Applying ML on these will provide actionable insights that can allow DevOps/SRE/ITOps teams to increase the stability of the systems. With a virtuous cycle of the above, and improving SLIs your SLO and SLA should be achievable.

To visualize this, imagine single drawings on various tracing papers. One has a Sun, one has a Palm tree, one has a lake, one has boats. By themselves, they are a correct data point but don’t tell much. Juxtapose them on top of each other. And they form a story, a complete picture of? A sunset on a lake.

To take another example.?

Imagine if your system was a person, who has communication problems ( Say has a different language or is mute) and hence unable to tell you if anything is wrong with her/him.

We check her/his temperature. It's a bit high, is something wrong ?.

(S)he implies that her/his left arm feels a bit tingly sometimes ( like your intermittent connection errors from various systems).

So we SUSPECT something might be wrong, but don’t know how wrong and/or whether it even merits any action (and if yes, what action ?).

But if the person could talk and elaborate all what (S)he is feeling properly ( i.e. we had proper observability), then it could have told us that there was some numbness, a bit of dizziness,?left side leg and arm not responding intermittently, haziness of vision. That would have told you that there is a high chance that the person had a stroke. And take emergency measures accordingly.

No alt text provided for this image


So a perfectly observable system is one whose complete internal state is understandable just by the data ( and patterns in that data ) being provided by that system.

In such a system, you can tell straight away ( and maybe even see it coming from a few miles away) whether a slow response is due to some calls going into loops, failed servers, memory exhaustion, or even network/switch level issues.

Right monitoring is at its core but it is much more than that. And plain events, threshold-based alerting is not the complete toolkit for complex systems.

要查看或添加评论,请登录

Kaushik Banerjee ( He/Him/His )的更多文章

社区洞察

其他会员也浏览了