Perusing Observability and AI Part 2
A few weeks ago I shared my thoughts on Observability and AI. As a quick recap: I mused that while early showings from GenAI provide a glimpse of an exciting future-state (one that potentially includes the holy grail of closed loop observability), we can't lose sight of the other more fundamental progress from the last few years. Progress that sets the foundation to allow enterprises to forge ahead on a productive modern observability journey, taking ownership and control of their data/telemetry, setting the stage to be ready to take advantage of any and all AI capabilities.
That post led to many great follow-on conversations, filled with rabbit holes, as so often is the case with Observability discussions. In one particular conversation, after exchanging the usual banter on how many definitions of observability we've run into that week, we marveled at the ways GenAI has given new life to the existing forms of Observability AI (think Anomaly/Outlier detection, Probable Root Cause Analysis, Predictive Analytics, Event analytics and Alert correlation, etc). Wait - those are AI? Yep. That led us to a fun exploration of what the math might look like behind the scenes - reminiscing about control theory - and how when you really think about it, at their core, Observability and AI are really just math.
So we thought - could one not come up with a new and novel way to explain modern observability - using math? and pretty mathematically generated visuals?
Disclaimers: (Note - we are both well versed in Observability's roots in control theory, but given the multivalent nature of Observability, highlighting the math always seemed too complex?) (Note too - AI is certainly about more than "just math" - but we're sticking to ELI5 here)
Modern Observability with Math and GenAI visuals
Starting at the top - let's revisit the actual definition of Observability.
But wait, what's control theory?
Okay - I get it. Applied mathematics! Equations! How do we populate those equations so that we can achieve 'optimality'?
OpenTelemetry is purpose-built for this. Think of OTel as the variables in the equation. The more distributed your application, the longer the equation - but, it's just math.
And, even though your digital ecosystem may look a lot like:
领英推荐
And your current approach to troubleshooting sometimes feels or looks like:
With OpenTelemetry and a modern observability solution powered by math (AI!), you get:
When all else fails, let's ask ChatGPT to weigh in:
"OpenTelemetry is a collection of tools for generating and managing telemetry data. Its goal is to be a complete telemetry system, suitable for monitoring modern microservice-based distributed systems. The goal of OpenTelemetry is to maximize the “observability” of a system. This buzz-word “observability” in the telemetry community refers to how well we can understand the inner workings of a black box just by examining its external outputs, as opposed to its codebase. In a highly observable system, we will be able to pinpoint where in a distributed system a failure occurs, what the bug is, and how to fix it merely by looking at the telemetry, and not the code. OpenTelemetry aims to solve the problem of collecting telemetry data from sources and fuse them with the right context to make them usable for the evaluation1. It provides a standard set of APIs and SDKs and an optional collector to collect distributed telemetry data1. The huge advantage is that it enables collecting telemetry data from different components of the system which could be based on different languages, frameworks and deployed in different hosts1."