Perusing Observability and AI

Perusing Observability and AI

Over the years I have often heard engineering leaders say some variation of: “I want an observability solution that simply tells me when there’s a problem, where and what the problem is and how to fix it…or, better yet, fix it for me”.? Before GenAI hit the scene, I usually replied with the merits of a full-stack approach with OpenTelemetry and how the very nature of the OpenTelemetry distributed tracing specification provides the missing context that has kept observability solutions from getting to that next level of quickly spotting when/where a problem is happening.? Now, I maintain the above position but combine it with the promises of GenAI (hype aside) and existing AI capabilities and I think it's actually plausible we may one day live in a world of closed-loop observability.

Until then, what should IT and Engineering leaders make of Observability?? (Especially given the general immaturity of observability across many enterprises overlayed with ever-increasing MTTR/MTTD metrics, as evidenced in the 2023 State of Observability Report/Survey.)? I spoke on this topic a few weeks ago at the C2C event in Boston where I surmised that while GenAI has provided a glimpse of what may be possible one day, we shouldn’t lose sight of the recent developments in observability (including existing forms of AI); that the pieces are in place today for enterprises to begin/improve upon their observability journey in meaningful ways, such that GenAI will merely provide gravy on top.?

The long-and-short of it is that cloud propensity/maturity, evolution of IT Ops teams > SRE/Platform teams, fragmentation of observability practice/tooling, maturity of OpenTelemetry and the breadth of analytic capabilities now available in many commercial observability platforms (including OOTB AI-directed troubleshooting/alerting/correlation) creates something of a perfect storm that sets the stage for any enterprise to implement a modern and effective observability practice. I plan to expound upon this in a future post but for now, a few other points/takeaways worth noting on this topic:

  1. Better data/signals = better AI outputs. Logically then, OpenTelemetry’s continued adoption is exciting, since it allows enterprises to take the much needed ownership of their telemetry and helps set the stage for that better data = better AI equation (see Jesse Tate Pulfer 's Opentelemetry primer for a deeper dive).
  2. Effective forms of AI are already here in many ways. Platforms like Splunk can automatically trigger alerts based on anomalous application/infrastructure/user behavior, or provide deep ML over a wider set of telemetry/KPIs.?
  3. Don’t lose sight of the basics.? Oftentimes business-impacting degradations are the result of change? - the practice of keeping an eye on changes alongside critical KPIs, I suspect, would save a lot of organizations from unnecessarily prolonged incidents. This is one example of a simple use case that often goes overlooked in the larger scheme of observability.??
  4. The stats on developer adoption of GenAI are enlightening. Perhaps instead someday we will live in a world where applications, the underlying code/infrastructure all run perfectly - such that Observability is no longer needed?? (slightly tongue-in-cheek, but have fun going down that rabbit hole for a bit)

In closing, I"ll share a recent snippet from a conversation I had with a good friend who runs SRE at a global enterprise:

Me: So, are you excited for what GenAI can bring to your observability practice?

Friend: Ha. We've got enough to worry about just getting our data strategy right. For now, just give me OpenTelemetry, OOTB dashboards/correlations/alerts and some good search as a fall back, and we'll be more than dangerous. But yes I'm excited about GenAI. (Paraphrased)


Deepti B.

DevSecOps Evangelist Customer-Centric Open Telemetry Trusted Advisor Thought Leader Observability

1 年

Amazing read so well articulated too.

回复
Mark Barbour

Records Management Professional

1 年

Fascinating article. Thank you Rob!

回复
Avery Lewis

Vector DB, AI, ML, LLM | Sales @ Pinecone

1 年

Have seen a lot of really cool stuff on this. At least 4 of the major players introducing GenAI into their solutions, already. Excited to see where it goes

回复
Connor Candito

COO at Respell (acq Salesforce)

1 年

Stud ??

回复

要查看或添加评论,请登录

Rob Oram的更多文章

社区洞察

其他会员也浏览了