登录查看更多内容

Humans vs. Dashboards

Brian Clabby

Observability GTM

发布日期: 2025年1月15日

Observability: A Human Endeavor

In the fast-paced world of site reliability engineering and DevOps, observability is often viewed as a tech-only puzzle—metrics, logs, and traces are the pieces, and voila, problem solved! But let’s be honest: behind every dashboard is a stressed-out engineer with a coffee addiction, interpreting data through a fog of exhaustion and hoping the alert is just another false positive. Observability, at its heart, is a human endeavor. The sociology of observability—how team dynamics, cognitive quirks, and differences in communication styles shape operational workflows—is what transforms data overload into something that actually helps.

Team Dynamics: The Backbone of Observability Success

Think of a team as a group project, but instead of trying to pass an exam, you’re trying to stop production from imploding. The way a team interacts often determines whether incidents are resolved quickly or become “that outage we don’t talk about.” Teams that foster psychological safety—where it’s okay to say, “I think I broke it” or “This metric makes no sense”—make far better use of their observability tools. On the flip side, siloed teams can feel like blindfolded contestants in a three-legged race. Observability tools can’t fix awkward team dynamics, but they can give everyone the same map to follow, provided you share it. After all, one of the top contributing factors to why MTTR is on the rise for the 3rd straight year is all thanks to the inability to effectively share information across teams.?

Cognitive Biases: The Hidden Saboteurs

Cognitive biases are the ultimate trolls of the human brain. Confirmation bias whispers, “That metric totally backs your theory; ignore the rest.” Availability bias reminds you of that one time the database caused problems, so it must be the database again, right? These sneaky biases can lead teams down rabbit holes of their own making. The good news? Anomaly detection and automated root cause analysis can act as the unbiased friend who says, “Maybe check over here instead?” The bad news? You still have to listen to that friend.?

The Importance of Human-Readable Telemetry

Telemetry without business context is like watching a foreign film without subtitles. Sure, it’s interesting, but you have no idea what’s going on. By tagging data with customer IDs, service regions, or performance metrics tied to revenue, you can bridge the gap between “This looks bad” and “This is costing us $10,000 an hour.” A spike in error rates might seem apocalyptic until you realize it’s only affecting the beta test for your internal widget nobody uses. Adding this kind of context helps teams prioritize real problems and stop wasting time on phantom issues. Quick plug - Splunk Observability provides unlimited cardinality, whereas other solutions restrict the amount of context one can incorporate since they can’t handle a large scale level of metadata.?

This might be one of the contributing factors towards “we have all our metrics, traces, logs and our dashboards look sweet - but I still have no idea what the root cause is or which customers are impacted.”

领英推荐

CBTW - IT & Tech Newsletter June 2024

CBTW IT & Technology / Positive Thinking Company 9 个月前

Building the Foundation for DataOps: Principles…

Jay Gimple 3 个月前

Asynchronism in System Design

Hari Mohan Prajapat 1 个月前

Communication Styles: Translating Data into Action

Ever played a game of “telephone” with telemetry? Someone says, “I think the error rate spiked,” and by the time it reaches the on-call lead, it’s, “Everything’s on fire, we need more servers!” Effective communication turns data into action without the melodrama. Annotation features, messaging platform integrations, and simple things like speaking the same jargon (here’s where OpenTelemetry will help) can bridge the gap. And if someone asks, “What does this mean for the business?”—that’s your cue to show off those charts that correlate service performance with business-level metrics (easier said than done, unless if you have Splunk).?

The Role of Documentation: Or, Why You Should Write Things Down

We get it—writing documentation feels like eating your vegetables. But without it, observability becomes a chaotic cycle of rediscovering the same solutions. Playbooks, incident retrospectives, and shared knowledge hubs ensure that when someone figures out why the error rate spiked, the whole team doesn’t have to start from scratch next time. Future you will thank present you for taking the time to write it all down.

Tool Design and the Human Element

Ever used a tool so complex it made you want to cry? That’s the wrong kind of human factor. Observability tools should feel like an extension of your brain, not a puzzle you can’t solve. Clean interfaces, intuitive dashboards, and alerting that doesn’t cause heart palpitations are essential. Tools that don’t fit into your team’s workflow are just another thing to curse during incidents. That said, out-of-the-box dashboards are great to look at, but are they actually helping guide you to the root cause??

Conclusion: Observability Is a Human-Centric Discipline

At the end of the day, observability tools are just tools. It’s the people using them—their collaboration, communication, and occasional caffeine-fueled brilliance—that turn raw data into real insights. By embracing the human side of observability, from team dynamics to layering in business context, you can build workflows that not only solve problems but also make the process a little less painful. And isn’t that what we’re all here for?

Observability Unplugged

399 位关注者

Will Ryan

Scale data streaming with security and control

2 个月

Fantastic article, and superbly written as always.

查看更多评论

要查看或添加评论，请登录

Brian Clabby的更多文章

AIOops

2025年3月19日

AIOops

The biggest challenge with AIOps isn't the technology - it's the expectations. I think it's finally time to hit the…

7 条评论
Part 2 - OpenTelemetry: No Strings (agents) Attached

2024年10月31日

Part 2 - OpenTelemetry: No Strings (agents) Attached

Hello again, fellow Observers! As promised, following up on my previous article with the highly anticipated Part 2 :)…
Part 1 - OpenTelemetry: No Strings (agents) Attached

2024年10月10日

Part 1 - OpenTelemetry: No Strings (agents) Attached

Let’s talk about OpenTelemetry for a second. If you’ve been anywhere near the observability world lately, you’ve likely…

8 条评论

Humans vs. Dashboards

Brian Clabby

Observability GTM

Observability: A Human Endeavor

Team Dynamics: The Backbone of Observability Success

Cognitive Biases: The Hidden Saboteurs

The Importance of Human-Readable Telemetry

领英推荐

Communication Styles: Translating Data into Action

The Role of Documentation: Or, Why You Should Write Things Down

Tool Design and the Human Element

Conclusion: Observability Is a Human-Centric Discipline

Observability Unplugged

399 位关注者

Brian Clabby的更多文章

社区洞察

其他会员也浏览了

Let's understand DataOps

Marvelous MLOps #45: The Ultimate Must-Haves and Nice-to-Haves for MLOps & LLMOps

The Future of Observability in MLOps and SRE: How We Move Beyond Noise to Action

Automating Everything: SRE’s Role in MLOps Workflows

10 Benefits of DataOps and Why Modern Businesses Should Utilize Them

SRE and GenAI: Bridging the Gap Between Automation and Innovation

MLOps Maturity Stages

Should You Care About MLOps? Why and How Much? (ML4Devs Newsletter, Issue 12)

The Intelligent Matrix of AI and DevSecOps in Software Development

Working of MLOps (Part-3)

Observability: A Human Endeavor

Team Dynamics: The Backbone of Observability Success

Cognitive Biases: The Hidden Saboteurs

The Importance of Human-Readable Telemetry

领英推荐

Communication Styles: Translating Data into Action

The Role of Documentation: Or, Why You Should Write Things Down

Tool Design and the Human Element

Conclusion: Observability Is a Human-Centric Discipline

Observability Unplugged

399 位关注者

Brian Clabby的更多文章

AIOops

Part 2 - OpenTelemetry: No Strings (agents) Attached

Part 1 - OpenTelemetry: No Strings (agents) Attached

社区洞察

其他会员也浏览了

Let's understand DataOps

Marvelous MLOps #45: The Ultimate Must-Haves and Nice-to-Haves for MLOps & LLMOps

The Future of Observability in MLOps and SRE: How We Move Beyond Noise to Action

Automating Everything: SRE’s Role in MLOps Workflows

10 Benefits of DataOps and Why Modern Businesses Should Utilize Them

SRE and GenAI: Bridging the Gap Between Automation and Innovation

MLOps Maturity Stages

Should You Care About MLOps? Why and How Much? (ML4Devs Newsletter, Issue 12)

The Intelligent Matrix of AI and DevSecOps in Software Development

Working of MLOps (Part-3)