登录查看更多内容

Observability Hacks

Lovjit Singh Bedi

Software Engineer

发布日期: 2024年4月29日

Observability isn't just a buzzword; it's the art of understanding the internal workings of a system from its external outputs, like metrics or logs. In the era of microservices, our systems have paradoxically become more complex and challenging.

In this piece, I'll share simple, yet effective hacks from my experience that can enhance observability within your organization or team.

1. Incentivize alert acknowledgment:

Creating a culture of responsiveness: Establish a norm where each alert is promptly acknowledged. Automatically create a ticket for every alert. This allows the on-call engineer to review each associated ticket at the end of a sprint, helping prioritize actions for the next cycle.
Encouraging adoption: Integrate this process into your team's performance metrics to foster adherence.
Practical impact: Imagine your team maintains a microservice with a 95% uptime goal. This translates to no more than 18 P0 alerts per year (365 X 1-0.95 ~ 18), assuming a P0 alert signifies customer downtime. With a diligent approach to acknowledging and acting upon alerts, you'll gradually notice a reduction in trivial alerts, enhancing focus on crucial issues.
Long-term benefits: An efficient alert system reduces noise and fine-tunes your response mechanisms.

2. Maintain an Updated Run Book:

The underrated tool: Often seen as tedious, keeping a runbook updated is invaluable. It empowers prompt and effective actions during critical incidents.
Incorporate into handovers: Make it a routine to discuss alert triggers and the adequacy of the run book during on-call handovers.

领英推荐

CBTW - IT & Tech Newsletter June 2024

CBTW IT & Technology / Positive Thinking Company 5 个月前

Solving the Federated API Management Problem

Wendy Cameron 4 个月前

The Future of Observability in MLOps and SRE: How We…

Yoseph Reuveni 2 个月前

3. Design intuitive dashboards:

Avoid information overload: In a crisis, every minute counts. Avoid cluttering dashboards with excessive graphs and charts.
Focus on key signals: Feature the four golden signals (latency, errors, traffic, saturation) prominently.
Logical grouping: Organize your dashboard into sections like upstream info, downstream info, database insights, autoscaling, etc. This speeds up dashboard loading and aids in focusing on relevant data.
Add contextual layers: Incorporating the release_id into each chart can simplify analysis.
Deep dive links: Include direct links to logs from the dashboard for more thorough investigations.

4. Consistent logging across microservice:

Uniformity for efficiency: A standard logging format can significantly streamline debugging processes, eliminating the need to adapt to different log formats.
Common schema for ingestion: A common logging format allows for a common schema to be used for ingestion and helps create the right index for the search query, resulting in a faster response to results.

I hope my experiences shed some light on making observability less of a puzzle. It's not just about the tools and techniques; it’s about creating a culture that values attention to detail and readiness to adapt.

Stay curious & keep experimenting.

要查看或添加评论，请登录

Lovjit Singh Bedi的更多文章

Tale of Software Engineer: Influencing without authority

2024年1月23日

Tale of Software Engineer: Influencing without authority

Have you ever found yourself brimming with a groundbreaking idea, only to face a daunting wall of bureaucracy and…

1 条评论
Unlocking High-Impact Activities in Your Software Engineering Career

2023年8月31日

Unlocking High-Impact Activities in Your Software Engineering Career

The more you grow in your career, the more impact (positive of course) is expected from you. Navigating this landscape…

3 条评论
Code Consistency With Examples & Services Framework

2023年5月23日

Code Consistency With Examples & Services Framework

Maintaining consistent coding guidelines across medium to large organisation is always a very challenging task and…
What is the hidden force behind your design decision?

2023年5月10日

What is the hidden force behind your design decision?

I recently came across Conway’s law which states that “Organizations which design systems are constrained to produce…

6 条评论

Observability Hacks

Lovjit Singh Bedi

Software Engineer

领英推荐

Lovjit Singh Bedi的更多文章

社区洞察

其他会员也浏览了

Four Pillars Of Observability in Kubernetes

Top 5 Datadog Alternatives in 2024

Gleecus Gazette - July 2024

Observability vs. Monitoring: Understanding the Differences and Their Roles in System Resilience

Everything You Ever Heard About Observability is Wrong

Unveiling the Causal Revolution in Observability

vuSmartMaps Observability and MLOps Models

Harnessing Observability: A Practical Guide to Grafana and Loki

THE 5 STAGES OF THE OBSERVABILITY MATURITY MODEL

Architecture Weekly #122 - 10th April 2023

领英推荐

Lovjit Singh Bedi的更多文章

Tale of Software Engineer: Influencing without authority

Unlocking High-Impact Activities in Your Software Engineering Career

Code Consistency With Examples & Services Framework

What is the hidden force behind your design decision?

社区洞察

其他会员也浏览了

Four Pillars Of Observability in Kubernetes

Top 5 Datadog Alternatives in 2024

Gleecus Gazette - July 2024

Observability vs. Monitoring: Understanding the Differences and Their Roles in System Resilience

Everything You Ever Heard About Observability is Wrong

Unveiling the Causal Revolution in Observability

vuSmartMaps Observability and MLOps Models

Harnessing Observability: A Practical Guide to Grafana and Loki

THE 5 STAGES OF THE OBSERVABILITY MATURITY MODEL

Architecture Weekly #122 - 10th April 2023