登录查看更多内容

There is no spoon

Vladimir Ulogov

Solving problems in Amadeus

发布日期: 2023年4月20日

Every joke is usually deeply rooted in the necessity for human beings to tell some facts and a story as individual human beings see fit best. And good laughingstocks are suitable not for a few, but for many. Stating that, let me tell you an old joke that originated from some historical facts about the Soviet Union, survived the years and the Soviet Union itself, and is somewhat still alive and relevant nowadays.

At the international exhibition, the Soviet Union announced the first of its kind, a mechanized and robotized barber machine. Curious spectators visited the venue and observed an enormous gray box with a head-sized hole in the middle. Most asked: “How does this work ?” The dude in a white lab coat standing by the machine and presenting says that this is a new and breakthrough invention that removes the human barbers' necessity to cut someone's hair. This apparatus can be installed everywhere and even dropped into a combat zone on a parachute. All you need to do is to put your head into the hole, and the machine will do the rest. In the end, your hair will be trimmed appropriately. “But ain't everyone has a different shape of a head. How is this machine dealing with that??” spectators asked curiously. “Oh, no problem,” the cheerful presenter smiled. “This difference will be maintained until the first séance of cutting. After that, there will be no difference”.

As with any joke, this one has a hidden message, and of course, the factual statement in this joke is not about cutting the hair. It is about unifying the views and the way of thinking in what you can call a “totalitarian state.”

How could this be related to the Art of Monitoring and Observability? In 2016, Beyer, B., Jones, C., Murphy, N. & Petoff, J. came up with the article “Site Reliability Engineering. How Google runs production systems.” They came to the idea of “Four Golden Signals.”

While, in general, this book is handy for a practicing SRE, as it has some very decent tips on risk management, toil eliminating, tracking, troubleshooting, and many others, it is not an unquestionable “Bible.” Every SRE must read this book and use what applies to their practices.

One of the most questionable statements in this book is the idea of “golden signals.” They are:?

Latency
Traffic
Errors
Saturation

You may ask, —Why do I see this selection of those categories of the signals as “problematic”? Like in the opening joke, there are “different heads,” This difference must be maintained even after the “first cut.” The problem is that many people, including many observability practitioners, trust Google's opinion without any questions and a shred of doubt. So, whenever you try to apply those four generally applicable categories to all IT practices and monitoring needs, sometimes you have to “hammer in” those categories forcefully. Why? And this is where the difficulties begin.

领英推荐

How to use the 5 Whys Analysis to identify the Root…

Everything about Lean Six Sigma (EALSS academy) 1 年前

Building Trust with Zero Knowledge

Adam Larter 10 个月前

Myths of aSPICE 2 - Why bother with processes at all?

Petr ?vimbersky 1 个月前

First, there are multiple personas in the observability business. And those different personas serve different needs by providing various services. “Golden signals,” as they have been defined by Google, are suitable for a very narrow stretch of IT professionals, primarily employed in the subset of “Site Reliability Engineer” roles as someone responsible for “reliability.” Those signals are not “golden” even for various tasks for which traditional SRE is frequently accountable. Let me bring this roster of some functions requiring access to observability data and not covered by the “golden signals” category:

Performance monitoring.
Capacity planning.
External resources monitoring.
APM.
Root cause analysis.

This non-exhaustive list of tasks, not even touching the needs of:

DevOps “personas,” whose tasks are mainly rotating around monitoring processes and pipelines.?
Sec Ops “persona,” who cares about traffic but simultaneously involves signals and patterns analysis in extensive terms.
Data Warehouse managers and administrators' “personas” are responsible for capacity and resource management and monitoring, pipelines, and application pipelines.

And this is just a scratch on the surface. Numerous “personas” in the IT business have extremely diverse ideas about what is “golden signal” is in a particular context and for specific and sometimes not well-defined purposes. And those ideas are rooted in the fact that different metrics, and sometimes calculated and compound metrics, make more sense for specific tasks.

And what classic Google “golden metrics” are good for? As mentioned before, latency, traffic, errors, and saturation are only suitable for measuring and reliability. The latency is a time measure of some operation or request. When “persona” is responsible for “reliability,” latency is a critical KPI? But for other purposes, such as “capacity planning,” latency is a secondary KPI, and capacity-based metrics are becoming more acute. Traffic is a non-descriptive measure of ether bytes or requests generated to some endpoints. When you are taking care of “reliability,” traffic is usually directly related to latency, giving you an idea of how well your endpoints handle the load. DevOps “persona's” interests typically do not directly connect to measuring the loads. So, this is secondary telemetry for this“persona” While critical for operational “reliability” tasks, errors are secondary at best for capacity planning and resource management. They are also secondary for a Sec Ops “persona” For this class of IT personnel, analysis of the signals and patterns is more crucial. Although saturation is a class of KPI, we can call it “most universal” across different personas. Most “personas” need to observe some exhaustive resource as a primary task. So, saturation is spot-on, more or less a universal “golden signal”

Therefore, what is the verdict? What kind of outcomes can you derive from this short article? Foremost, no “golden signals” as defined in Google's “SRE Book” are universal across the board. Different environments for different “personas” represent a different subset of metrics and categories that provides an adequate and neat view of specific problems or series of problems through the metrics or classes that fit the best. And yes, precisely as there are no“golden metrics” we can say with certainty that building a computerized barber machine that will consider different facts about different human beings will be a task that is not only not easy but on the brink of feasibility and practicality.?

要查看或添加评论，请登录

Vladimir Ulogov的更多文章

Listen to a silence

2023年5月9日

Listen to a silence

Monitoring and observability start with collecting the various telemetry and other related data. Then, various…

3 条评论
In the search for unknown in the data

2023年4月26日

In the search for unknown in the data

What are we looking for? In the business of observability, we are trying to comprehend the processes happening on the…
Open your horizon!

2023年4月23日

Open your horizon!

Life before “golden signals.” There was a time when there was no Google.
Establishing causality

2023年4月20日

Establishing causality

Establishing causality is one of the most essential tasks of Monitoring and Observability. Let's show what cause is…
A Zen of monitoring

2023年4月20日

A Zen of monitoring

This article is not a tutorial, but a philosophical reflection on the question that many professionals involved in…

3 条评论
Incomplete bits and pieces about telemetry types

2022年7月28日

Incomplete bits and pieces about telemetry types

In the business of monitoring and observability, you are constantly hear words "Telemetry" and "Metrics". And for the…
Few pointers on how to survive a job hunt.

2020年4月21日

Few pointers on how to survive a job hunt.

Now, when number of people actually looking for a new place of employment and some job-seeking activity on the rise…
Integrating Zabbix into your enterprise for fun and profit. DNS integraton.

2014年10月9日

Integrating Zabbix into your enterprise for fun and profit. DNS integraton.

1. Why ? There are lot of ways of how you can manage you company, home or corporate DNS zones.
How I Learned to Stop Worrying and Love the Zabbix. (Part 3 and the last one)

2014年9月25日

How I Learned to Stop Worrying and Love the Zabbix. (Part 3 and the last one)

Here is a Part 1 and a Part 2. So, you are the systems manager or administrator or whatever title you've got.
How I Learned to Stop Worrying and Love the Zabbix. (Part 2)

2014年9月10日

How I Learned to Stop Worrying and Love the Zabbix. (Part 2)

In the Part 1 of my path towards choice f the monitoring platform, which could satisfy my requirements, first, I have…

See all articles

社区洞察

Algorithms

You're facing a challenge with legacy algorithms. How can you smoothly transition to more advanced systems?

There is no spoon

Vladimir Ulogov

Solving problems in Amadeus

领英推荐

Vladimir Ulogov的更多文章

社区洞察

其他会员也浏览了

AI-Powered Resilience: Transforming Manufacturing Efficiency in Uncertain Times feat. Data Lighthouse

Blind Spots or Beacons? How Vision and Focus Guide Your IT Trajectory

Measuring Anything

Bullshitter's Journey: From Disas _ _ _ to _ _ _ ievement

Jeff Bezos and Causal Loop Diagrams

TAGQ, TFTQ “That’s a good question, thanks for that question.” There are other options…

The Sunday Summary: Strategy, attributes, and escaping the algorithms

THE VALUE OF COMMON SENSE -- sometimes it fails to consider outlier information wrongfully convicting people or politicians use it to hide the truth

领英推荐

Vladimir Ulogov的更多文章

Listen to a silence

In the search for unknown in the data

Open your horizon!

Establishing causality

A Zen of monitoring

Incomplete bits and pieces about telemetry types

Few pointers on how to survive a job hunt.

Integrating Zabbix into your enterprise for fun and profit. DNS integraton.

How I Learned to Stop Worrying and Love the Zabbix. (Part 3 and the last one)

How I Learned to Stop Worrying and Love the Zabbix. (Part 2)

社区洞察

其他会员也浏览了

AI-Powered Resilience: Transforming Manufacturing Efficiency in Uncertain Times feat. Data Lighthouse

Blind Spots or Beacons? How Vision and Focus Guide Your IT Trajectory

Measuring Anything

Bullshitter's Journey: From Disas _ _ _ to _ _ _ ievement

Jeff Bezos and Causal Loop Diagrams

TAGQ, TFTQ “That’s a good question, thanks for that question.” There are other options…

The Sunday Summary: Strategy, attributes, and escaping the algorithms

THE VALUE OF COMMON SENSE -- sometimes it fails to consider outlier information wrongfully convicting people or politicians use it to hide the truth