The Agent Wars: The Battle for Observability at Scale
Dale Frohman
Lead Director Observability Engineering. Having fun with Observability, Data, ML & AI
Some service, somewhere, is throwing errors like a toddler hurling Legos. You open your dashboard. You wait. And wait. Where are the logs? Where are the traces? Oh, right, your observability pipeline is lagging because your cloud provider decided now was a great time to throttle egress.
This, my friends, is why we’re here.
Welcome to the Agent Wars
A battle for observability at scale, efficiency at the edge, and a future that doesn’t involve second-mortgaging your infrastructure budget just to store logs.
The State of the Battlefield
Observability agents are the unsung heroes of modern systems. They collect, compress, filter, and route data so we can troubleshoot faster than we break things. But which one should we use?
Meanwhile, traditional players (Datadog, Dynatrace, AppDynamics, etc.) have their own agents, powerful, well-integrated, but decidedly not open-source and often not cheap.
So where does that leave us?
The Next Move: Scaling Observability at the Edge
If you’re running everything in the cloud, your observability strategy is likely:
But companies are realizing the cloud isn’t a data trash can
领英推荐
It’s expensive, slow, and unnecessary for every single piece of telemetry.
Some organizations are shifting back on-prem or hybrid architectures to control costs and optimize performance. Recent examples include:
We need open-source observability agents that can:
Right now, there isn’t a clear winner in OSS observability agents. But teams are working on it, both publicly and privately.
2025: The Rise of the Observability + AI Agents
Now, layer in AI.
This isn’t just a fantasy, it’s already happening. But to truly operationalize observability, we need a universal way to:
Observability is evolving fast, and 2025 is going to be a pivotal year for open-source agent innovation. If we get this right, we reduce cost, improve reliability, and finally stop playing Where’s Waldo? with logs and metrics.
So what can you do today?
The Agent Wars have begun. Choose wisely.
Observability GTM
1 周Compress, filter, and route ???? heard a cool story recently about a global quick-serve shop aggregating metrics to 5% with OTel and automating the process of “opening (and closing) the firehose” when a retail store location experienced an issue. Simple webhook. Temporary burst of full metrics (and logs) ingest for RCA, then right back to 5%. They found a nice loophole for avoiding ingest and high cardinality costs.
Partner Technical Manager/Pre-Sales Architect @ Cisco | Toastmaster
2 周Interesting perspective Dale , Agentic AI amalgamting with obserbility is game changing proposition.
Manager Business Technology @ Workday | Enabling AIOps & Observability
2 周Great observability points … I think hybrid observability architecture with open source , where otel agents + MELT from multiple sources to AIOps supported platforms would to help.
Founder / CPO
2 周Good one Dale Frohman - what it brings home to me is not just the agent itself, but the supporting infra (and intelligence) to operationalize them as you point out ("control plane" if you will) - and shouldn't that control plane itself also be open?
VP Solutions Engineering @ Odigos | eBPF, OpenTelemetry, Better Traces = Better Observability Decisions
2 周Interesting points about the agent wars! I see the observability datastore and AI platform battles as ongoing, but agree that OpenTelemetry's victory as the standard format is becoming clear, especially for end-user collection. We are trying to see "how" – how do we automate and secure OpenTelemetry deployment and management at scale? At Odigos we see eBPF playing a significant role in this, and believe the next big challenge is building the orchestration layer to truly unlock the power of OTEL and #eBPF in a secure and automated way