Deductive AI转发了
It was a pleasure speaking at the infra.sf meetup last week about the future of AI-driven observability. What made the experience even more meaningful was the opportunity to share the stage with Charity Majors, an engineering leader I greatly respect for her deep insights and compelling articulation. At Deductive AI, we’ve been running AI-powered agents for root cause analysis (RCA) in production for over a year and have gained some good insights along the way. While I couldn’t cover all our learnings in a short talk, I shared three design patterns with the audience that I feel can be applied to most domain-specific agentic workflows: 1. Search as the Core Primitive – Many people may not think of it this way, but RCA is fundamentally a search and query problem across structured, semi-structured, and unstructured data -- often with extremely high cardinality and dimensionality. Effectively searching across this data requires building and maintaining rich search indices that track relationships across metrics, attribute keys, values, log patterns, and code metadata. Without this, even the most advanced AI systems struggle with signal correlation and causal inference at scale. 2. Using Approximations When Possible – Humans naturally excel at eyeballing/approximations; machines do not. This gap is especially evident in observability, where the exact precision of data often matters less than detecting the presence or absence of anomalies. At scale, exhaustively analyzing every log line, metric and trace is impractical. We found that structured sampling -- leveraging entropy, anomaly scores, and cluster similarity -- is a great tool for preserving statistical significance while filtering noise. 3. Code as a First-Class Telemetry Index – We found that leveraging code to search for telemetry data significantly improves RCA. Unlike natural language, code has strict syntax, execution rules, and lower entropy, making it easier for (fine-tuned) language models to learn and predict failures. We often find that starting our investigation by retrieving/understanding code and then using it to correlate logs, traces, and metrics, significantly improves RCA by aggressively pruning the search space. These are just a few of our learnings, but there’s obviously so much more to this. If you’re also building domain-specific agents, I’d love to hear what’s worked for you -- and what hasn’t!
It was standing room only for best SRE leaders in SF to learn if they would still have a job in 5 years ?? Last week in SF we packed out the Convex office with the brightest minds in software infrastructure to hash out the future of observability over beer and pizza ???? AI is creating a tectonic shift in the way we monitor and debug systems - in 5 years does the SRE role still exist or will developers own the work with an SRE copilot? ?? Talks from these infra MVPs got the audience of engineering leaders buzzing: ??Sameer Agarwal?- CTO & Co-Founder,?Deductive AI?"building AI-Powered SRE Agents" ??Achille Roussel?- CTO & Co-Founder,?Firetiger?“streaming opentelemetry signals into iceberg“ ??Charity Majors?- CTO & Co-Founder,?honeycomb.io?“physics of computing will kill your pillars” The following discussion with engineers from the early days of Slack, Segment, Databricks and SRE leaders from Netflix, Cloudflare, Stripe made it clear that there is an exceptionally high bar for full automation to be trusted (unlikely within 5 years) but all are testing new AI tools for SREs and developers ?? I'm going to share more learnings from this group and more on where the o11y market is moving and the new opportunities emerging for startups and incumbents - watch this space! Sign-up link in the comments to join this group next time in SF or NYC ??
-
-
-
-
-
+1