登录查看更多内容

The Agent Wars: The Battle for Observability at Scale

Dale Frohman

Lead Director Observability Engineering. Having fun with Observability, Data, ML & AI

发布日期: 2025年2月11日

Some service, somewhere, is throwing errors like a toddler hurling Legos. You open your dashboard. You wait. And wait. Where are the logs? Where are the traces? Oh, right, your observability pipeline is lagging because your cloud provider decided now was a great time to throttle egress.

This, my friends, is why we’re here.

Welcome to the Agent Wars

A battle for observability at scale, efficiency at the edge, and a future that doesn’t involve second-mortgaging your infrastructure budget just to store logs.

The State of the Battlefield

Observability agents are the unsung heroes of modern systems. They collect, compress, filter, and route data so we can troubleshoot faster than we break things. But which one should we use?

OpenTelemetry: The heir apparent, the chosen one, the Luke Skywalker of observability. Except… it’s still learning to use the Force. It’s young, evolving, and full of promise, but it’s not quite the fully realized Jedi we need, yet.
Grafana’s Agent: Slim, efficient, and growing. But is it a full-stack solution for logs, metrics, and traces at scale? Maybe not yet.
eBPF-based agents: The sorcery of kernel-level observability. Unparalleled efficiency, but adoption is limited. Why?

Meanwhile, traditional players (Datadog, Dynatrace, AppDynamics, etc.) have their own agents, powerful, well-integrated, but decidedly not open-source and often not cheap.

So where does that leave us?

The Next Move: Scaling Observability at the Edge

If you’re running everything in the cloud, your observability strategy is likely:

Ingest everything,
Pay massive egress and storage costs,
Regret your life choices.

But companies are realizing the cloud isn’t a data trash can

领英推荐

Artificial Intelligence Infrastructure to Surpass…

Analytics Insight? 3 个月前

Revolutionizing Tech with AutoHPC

Brightskies 9 个月前

The Digital Breakaway Newsletter

TierPoint 1 年前

It’s expensive, slow, and unnecessary for every single piece of telemetry.

Some organizations are shifting back on-prem or hybrid architectures to control costs and optimize performance. Recent examples include:

37Signals (Basecamp, HEY) moving away from the cloud,
Dropbox reducing its cloud footprint in favor of custom infrastructure,
HashiCorp leaning into hybrid models to optimize workloads.

We need open-source observability agents that can:

Run at the edge and process data locally,
Compress, filter, and route logs before they even touch the pipeline,
Store and query at the edge (hello, EdgeDelta-style architectures),
Scale like a fleet, with centralized management and zero-touch upgrades.

Right now, there isn’t a clear winner in OSS observability agents. But teams are working on it, both publicly and privately.

2025: The Rise of the Observability + AI Agents

Now, layer in AI.

AI-assisted observability agents will auto-tune configurations,
AI models will detect anomalies before alerts explode,
AI-driven pipelines will intelligently decide what data to store, discard, or summarize.

This isn’t just a fantasy, it’s already happening. But to truly operationalize observability, we need a universal way to:

Deploy,
Upgrade,
Configure,
And manage these agents at scale.

Observability is evolving fast, and 2025 is going to be a pivotal year for open-source agent innovation. If we get this right, we reduce cost, improve reliability, and finally stop playing Where’s Waldo? with logs and metrics.

So what can you do today?

Evaluate your edge observability strategy. Are you still sending everything to the cloud?
Consider hybrid architectures that leverage unspent compute before you scale out your pipeline costs.
Keep an eye on open-source agent projects, because the next big thing isn’t coming from a SaaS vendor; it’s being built in the trenches right now.

The Agent Wars have begun. Choose wisely.

Brian Clabby

Observability GTM

1 周

Compress, filter, and route ???? heard a cool story recently about a global quick-serve shop aggregating metrics to 5% with OTel and automating the process of “opening (and closing) the firehose” when a retail store location experienced an issue. Simple webhook. Temporary burst of full metrics (and logs) ingest for RCA, then right back to 5%. They found a nice loophole for avoiding ingest and high cardinality costs.

Joydeep Chatterjee

Partner Technical Manager/Pre-Sales Architect @ Cisco | Toastmaster

2 周

Interesting perspective Dale , Agentic AI amalgamting with obserbility is game changing proposition.

Hemendra Gaur

Manager Business Technology @ Workday | Enabling AIOps & Observability

2 周

Great observability points … I think hybrid observability architecture with open source , where otel agents + MELT from multiple sources to AIOps supported platforms would to help.

2 次回应

Jonathan M. Reeve, PhD

Founder / CPO

2 周

Good one Dale Frohman - what it brings home to me is not just the agent itself, but the supporting infra (and intelligence) to operationalize them as you point out ("control plane" if you will) - and shouldn't that control plane itself also be open?

Eric Horsman

VP Solutions Engineering @ Odigos | eBPF, OpenTelemetry, Better Traces = Better Observability Decisions

2 周

Interesting points about the agent wars! I see the observability datastore and AI platform battles as ongoing, but agree that OpenTelemetry's victory as the standard format is becoming clear, especially for end-user collection. We are trying to see "how" – how do we automate and secure OpenTelemetry deployment and management at scale? At Odigos we see eBPF playing a significant role in this, and believe the next big challenge is building the orchestration layer to truly unlock the power of OTEL and #eBPF in a secure and automated way

5 次回应

查看更多评论

要查看或添加评论，请登录

Dale Frohman的更多文章

Paperwork. The Tax You Pay for Bad Engineering

2025年2月26日

Paperwork. The Tax You Pay for Bad Engineering

It was a normal Tuesday, until it wasn’t. At exactly 2:37 p.
The Time-Traveling Engineering Team: Balancing Past, Present, and Future with Observability

2025年2月20日

The Time-Traveling Engineering Team: Balancing Past, Present, and Future with Observability

You ever feel like you're stuck in a sci-fi movie where you’re simultaneously fixing a steam-powered locomotive…

2 条评论
The Eagles, the Underdogs, and the Power of Listening: A Playbook for Engineering Leaders

2025年2月3日

The Eagles, the Underdogs, and the Power of Listening: A Playbook for Engineering Leaders

At the start of this NFL season, the Philadelphia Eagles were struggling. Sure, they had talent.
Observability’s Last Mile

2025年1月22日

Observability’s Last Mile

Let’s be honest: debugging production issues can sometimes feel like being the detective in a bad mystery novel. You’re…
Whiteboardware: How to Stop Talking About the Work and Start Doing It

2025年1月15日

Whiteboardware: How to Stop Talking About the Work and Start Doing It

Let me set the scene for you: You’re standing at the whiteboard, a marker in hand. Around you, a group of…
Time for Your Observability Data Diet

2025年1月8日

Time for Your Observability Data Diet

New Year, New Data Ah, January—the month of resolutions, gym sign-ups, and kale smoothies that nobody asked for. While…

3 条评论
I Had the Chance to Visit the North Pole, and Santa’s Observability is Unwrapping Our Industry

2024年12月18日

I Had the Chance to Visit the North Pole, and Santa’s Observability is Unwrapping Our Industry

Last week, I had a once-in-a-lifetime opportunity to visit the North Pole. I know, I know—sounds like a whimsical dream.
The Case for Observability 2.0: Why I'm All In

2024年12月12日

The Case for Observability 2.0: Why I'm All In

Observability 1.0 Walked So 2.

4 条评论
The Future of Observability: Shifting Left with AI-Driven Agents

2024年12月6日

The Future of Observability: Shifting Left with AI-Driven Agents

"Welcome to the Golden Age of Observability… Maybe" Picture this: a developer gets up from their desk, pours a coffee…

2 条评论
Gratitude at Scale: How a Small Team Does the Impossible

2024年11月26日

Gratitude at Scale: How a Small Team Does the Impossible

Ah, Thanksgiving. That magical time of year when we pause to reflect, express gratitude, and pretend we don’t know how…

2 条评论

See all articles

The Agent Wars: The Battle for Observability at Scale

Dale Frohman

Lead Director Observability Engineering. Having fun with Observability, Data, ML & AI

Welcome to the Agent Wars

The State of the Battlefield

The Next Move: Scaling Observability at the Edge

领英推荐

2025: The Rise of the Observability + AI Agents

Dale Frohman的更多文章

社区洞察

其他会员也浏览了

Why cloud-native platforms will be key for IT leaders in 2022

Magenta Insights - May 2023

Building cloud-native systems without being locked into a specific cloud platform.

IT 2024: The Evolution of Observability

The Multicloud approach

Expert Insights right from Azure MVPs!

Accountability-as-a-Service? Sorry, that’s on you.

When does a boat become a ship?

Monitor Azure Resources

My confidence in our future together

Welcome to the Agent Wars

The State of the Battlefield

The Next Move: Scaling Observability at the Edge

领英推荐

2025: The Rise of the Observability + AI Agents

Dale Frohman的更多文章

Paperwork. The Tax You Pay for Bad Engineering

The Time-Traveling Engineering Team: Balancing Past, Present, and Future with Observability

The Eagles, the Underdogs, and the Power of Listening: A Playbook for Engineering Leaders

Observability’s Last Mile

Whiteboardware: How to Stop Talking About the Work and Start Doing It

Time for Your Observability Data Diet

I Had the Chance to Visit the North Pole, and Santa’s Observability is Unwrapping Our Industry

The Case for Observability 2.0: Why I'm All In

The Future of Observability: Shifting Left with AI-Driven Agents

Gratitude at Scale: How a Small Team Does the Impossible

社区洞察

其他会员也浏览了

Why cloud-native platforms will be key for IT leaders in 2022

Magenta Insights - May 2023

Building cloud-native systems without being locked into a specific cloud platform.

IT 2024: The Evolution of Observability

The Multicloud approach

Expert Insights right from Azure MVPs!

Accountability-as-a-Service? Sorry, that’s on you.

When does a boat become a ship?

Monitor Azure Resources

My confidence in our future together