There is no spoon

There is no spoon

Every joke is usually deeply rooted in the necessity for human beings to tell some facts and a story as individual human beings see fit best. And good laughingstocks are suitable not for a few, but for many. Stating that, let me tell you an old joke that originated from some historical facts about the Soviet Union, survived the years and the Soviet Union itself, and is somewhat still alive and relevant nowadays.

At the international exhibition, the Soviet Union announced the first of its kind, a mechanized and robotized barber machine. Curious spectators visited the venue and observed an enormous gray box with a head-sized hole in the middle. Most asked: “How does this work ?” The dude in a white lab coat standing by the machine and presenting says that this is a new and breakthrough invention that removes the human barbers' necessity to cut someone's hair. This apparatus can be installed everywhere and even dropped into a combat zone on a parachute. All you need to do is to put your head into the hole, and the machine will do the rest. In the end, your hair will be trimmed appropriately. “But ain't everyone has a different shape of a head. How is this machine dealing with that??” spectators asked curiously. “Oh, no problem,” the cheerful presenter smiled. “This difference will be maintained until the first séance of cutting. After that, there will be no difference”.

As with any joke, this one has a hidden message, and of course, the factual statement in this joke is not about cutting the hair. It is about unifying the views and the way of thinking in what you can call a “totalitarian state.”

How could this be related to the Art of Monitoring and Observability? In 2016, Beyer, B., Jones, C., Murphy, N. & Petoff, J. came up with the article “Site Reliability Engineering. How Google runs production systems.” They came to the idea of “Four Golden Signals.”

While, in general, this book is handy for a practicing SRE, as it has some very decent tips on risk management, toil eliminating, tracking, troubleshooting, and many others, it is not an unquestionable “Bible.” Every SRE must read this book and use what applies to their practices.

One of the most questionable statements in this book is the idea of “golden signals.” They are:?

  • Latency
  • Traffic
  • Errors
  • Saturation

You may ask, —Why do I see this selection of those categories of the signals as “problematic”? Like in the opening joke, there are “different heads,” This difference must be maintained even after the “first cut.” The problem is that many people, including many observability practitioners, trust Google's opinion without any questions and a shred of doubt. So, whenever you try to apply those four generally applicable categories to all IT practices and monitoring needs, sometimes you have to “hammer in” those categories forcefully. Why? And this is where the difficulties begin.

First, there are multiple personas in the observability business. And those different personas serve different needs by providing various services. “Golden signals,” as they have been defined by Google, are suitable for a very narrow stretch of IT professionals, primarily employed in the subset of “Site Reliability Engineer” roles as someone responsible for “reliability.” Those signals are not “golden” even for various tasks for which traditional SRE is frequently accountable. Let me bring this roster of some functions requiring access to observability data and not covered by the “golden signals” category:

  • Performance monitoring.
  • Capacity planning.
  • External resources monitoring.
  • APM.
  • Root cause analysis.

This non-exhaustive list of tasks, not even touching the needs of:

  • DevOps “personas,” whose tasks are mainly rotating around monitoring processes and pipelines.?
  • Sec Ops “persona,” who cares about traffic but simultaneously involves signals and patterns analysis in extensive terms.
  • Data Warehouse managers and administrators' “personas” are responsible for capacity and resource management and monitoring, pipelines, and application pipelines.

And this is just a scratch on the surface. Numerous “personas” in the IT business have extremely diverse ideas about what is “golden signal” is in a particular context and for specific and sometimes not well-defined purposes. And those ideas are rooted in the fact that different metrics, and sometimes calculated and compound metrics, make more sense for specific tasks.

And what classic Google “golden metrics” are good for? As mentioned before, latency, traffic, errors, and saturation are only suitable for measuring and reliability. The latency is a time measure of some operation or request. When “persona” is responsible for “reliability,” latency is a critical KPI? But for other purposes, such as “capacity planning,” latency is a secondary KPI, and capacity-based metrics are becoming more acute. Traffic is a non-descriptive measure of ether bytes or requests generated to some endpoints. When you are taking care of “reliability,” traffic is usually directly related to latency, giving you an idea of how well your endpoints handle the load. DevOps “persona's” interests typically do not directly connect to measuring the loads. So, this is secondary telemetry for this“persona” While critical for operational “reliability” tasks, errors are secondary at best for capacity planning and resource management. They are also secondary for a Sec Ops “persona” For this class of IT personnel, analysis of the signals and patterns is more crucial. Although saturation is a class of KPI, we can call it “most universal” across different personas. Most “personas” need to observe some exhaustive resource as a primary task. So, saturation is spot-on, more or less a universal “golden signal”


Therefore, what is the verdict? What kind of outcomes can you derive from this short article? Foremost, no “golden signals” as defined in Google's “SRE Book” are universal across the board. Different environments for different “personas” represent a different subset of metrics and categories that provides an adequate and neat view of specific problems or series of problems through the metrics or classes that fit the best. And yes, precisely as there are no“golden metrics” we can say with certainty that building a computerized barber machine that will consider different facts about different human beings will be a task that is not only not easy but on the brink of feasibility and practicality.?

要查看或添加评论,请登录

Vladimir Ulogov的更多文章

  • Listen to a silence

    Listen to a silence

    Monitoring and observability start with collecting the various telemetry and other related data. Then, various…

    3 条评论
  • In the search for unknown in the data

    In the search for unknown in the data

    What are we looking for? In the business of observability, we are trying to comprehend the processes happening on the…

  • Open your horizon!

    Open your horizon!

    Life before “golden signals.” There was a time when there was no Google.

  • Establishing causality

    Establishing causality

    Establishing causality is one of the most essential tasks of Monitoring and Observability. Let's show what cause is…

  • A Zen of monitoring

    A Zen of monitoring

    This article is not a tutorial, but a philosophical reflection on the question that many professionals involved in…

    3 条评论
  • Incomplete bits and pieces about telemetry types

    Incomplete bits and pieces about telemetry types

    In the business of monitoring and observability, you are constantly hear words "Telemetry" and "Metrics". And for the…

  • Few pointers on how to survive a job hunt.

    Few pointers on how to survive a job hunt.

    Now, when number of people actually looking for a new place of employment and some job-seeking activity on the rise…

  • Integrating Zabbix into your enterprise for fun and profit. DNS integraton.

    Integrating Zabbix into your enterprise for fun and profit. DNS integraton.

    1. Why ? There are lot of ways of how you can manage you company, home or corporate DNS zones.

  • How I Learned to Stop Worrying and Love the Zabbix. (Part 3 and the last one)

    How I Learned to Stop Worrying and Love the Zabbix. (Part 3 and the last one)

    Here is a Part 1 and a Part 2. So, you are the systems manager or administrator or whatever title you've got.

  • How I Learned to Stop Worrying and Love the Zabbix. (Part 2)

    How I Learned to Stop Worrying and Love the Zabbix. (Part 2)

    In the Part 1 of my path towards choice f the monitoring platform, which could satisfy my requirements, first, I have…

社区洞察

其他会员也浏览了