Incomplete bits and pieces about telemetry types

In the business of monitoring and observability, you are constantly hear words "Telemetry" and "Metrics". And for the right reason. We are performing observation through a telemetry that we receiving or, and that is important, calculating. Without data produced by the sources, which could be hosts, or other equipment, applications, frontends and backends, software and hardware components, cloud and container infrastructure, business processes, we can not make any observation and therefore conclusion about infrastructure that you are observing.?

But big question remains: what are the telemetry, which types of telemetry we are dealing with and how different types of Telemetry telling us a different stories about our infrastructure.

The end-goal of telemetry analysis is to come with a relevant stories about an infrastructure and overall business that this infrastructure involved with. People do not want to hear unsorted facts, the simple fact by itself doesn't tells much, but stories produced from those facts will illuminate the processes that happening in your environment. And understanding, how your telemetry can tell you the story is based on understanding of what your telemetry are.

What your Telemetry are ?

Where metric defines data, telemetry represents some data associated with timestamp. We can surely define a three major groups of the Telemetry data types:

  • Facts
  • Calculations
  • Relations

What is in your Facts ?

Fact is most basic form of telemetry, originated from your infrastructure and/or processes and delivering you some data about your infrastructure and business processes that you are observing. Facts are atomic and non-divisible. Facts are describing the single atomic state, measured at some moment in time. Facts are always placed on a timeline. Facts are always produced by the sources in infrastructure or business processes. There is no such thing as "orphaned Fact". Here?are few types of Facts that we can recognize:

  • Binary fact
  • Value fact
  • Complex fact
  • Descriptive fact


Binary Fact

Binary Fact, as name suggests can be always in one of two states. TRUE or FALSE, WORKING or NONWORKING, 1 or 0 and so on.

The use of Binary Fact are limited to indicate a binary state of some Telemetry. For example: "Application A, on Host B: STARED or STOPPED". Or "User C on cluster D authentication: SUCCESSFUL or FAILED". If state of Telemetry can not be represented as binary, this should be the case for a Value or Complex Fact. Binary Facts can be an outcome of Calculations.

Value fact

Value Fact is an indicator of a state for telemetry item, that can be expressed through some value. Large majority of telemetry items are Value facts. Value fact characterized by a single value associated with given Fact measured or obtained at specific time. Examples for a Value fact could be:

  • Load average on host A is 0.25
  • Battery status on UPS B is CHARGING
  • Throughput on interface C of switch D is 10500 kBps

Value facts could be used to represent a variety of true Facts about observed infrastructure or business process. When we are defining a Value facts, we must understand, that there are different types of the values for such Facts:?

  • Raw value. The value of the Value Fact representing a current state of the metric, represented by Telemetry item without any extra assumptions. For example, if "Answer is 42", we are taking this value without any further interpretation.
  • Counter. The counter value representing a count of some event, such as number of packets, number of starts or restarts of application and so on. Counter is always greater than 0, integer and commonly increasing. Every counter have a "reset timestamp". "Reset timestamp" is a timestamp, where counter was set to 0.
  • Delta. Arithmetical difference between previous value and current value. Cases for the Delta Facts are limited to the situations, where we do not care about an actual value, but rather immediate trends.
  • Rate. Counted number of events per timeframe (second, minute, hour, etc ...). Applications for the Rate Value are non-exclusively limited to the cases, where we do not care about actual count of the event, and we do not care about deltas, but rather to see how the counter is aligned on the Timeline, that allow to the observer to estimate various performance and load Metrics. For example: "Number of I/O operations in database is 100/sec"

And as well, as for the Binary facts, Value fact could be outcome of calculations.

Complex fact.

Value Fact is a variation of a Complex fact. The only difference is in the number of values associated with the fact. Value fact have a single value, Complex fact do have a multiple values defined by keys. The values of Complex fact can be of the same types as for a Value facts. The use for a Complex facts are the cases, where multiple values collected at the same time during the same probe or measure. For example: Load average on host A is lavg5 X, lavg10 Y, lavg15 Z.

Complex Fact can be a result of Computation.

Descriptive fact.

Common example of Descriptive facts are the Logs. And the description of what is Descriptive Fact as follows. Descriptive fact is a special case of the Value fact, where the value is a string, originally intended to be interpreted by humans. There are several technics including parsing and pattern searching and matching been developed as computations functions over values of Descriptive facts. Those technics are intended to produce a Value fact, Complex fact or a Binary fact out of values of a Descriptive fact.

This is rather non-exhaustive list of the value types for the Value Facts, but as I hope it is covering wide variety of the most common use cases. If you thing that I've missed some, let me know in comments.

Do you compute this ?

Computation types of telemetry data are producers the facts, but by itself are different from telemetry Facts by the origin. When the Facts are the data harvested from your infrastructure and business processes, Computations are performed over collected or computed Facts.

Computation is an integral part of data (Facts) collection. Not all Facts that's required for the analysis can and shall be gathered. Many of them are the result of computation over already known Facts. Before we will discuss the types of the Computing, let's talk about Telemetry Observation Matrix.


All metrics are fitting into a matrix

No alt text provided for this image


Telemetry Observation Matrix (TOM) is a two or three dimensional matrix where the columns representing sources of telemetry, rows, representing the specific metrics and each individual cell representing telemetry collected or calculated from the specific sources for specific metric.

When you fill TOM with telemetry collected in about same timeframe, you will get the representative snapshot of collected and calculated telemetry for your infrastructure or business processes as it exists at the specific moment.

By adding third dimension to the TOM, we are adding a Timeline.?

Types of Calculation.

Now, let's discuss about types of Computation.

  • Free-form computation.
  • Pattern searching.
  • Aggregation.

What can you compute ?

Free-form computation, as name suggests, do not limited itself to the use of the Telemetry data arranged over any axis of TOM, or any other grouped items. Free-form computation performing over existing data, located anywhere on the timeline, while computing formula is defined by the user. There are too many examples of free-form computation, but let me bring the one: calculate an average per-process memory utilization of all processes, where the name of the process is matched to string "java".

It is all about patterns.

One of the important analytical tasks is to detect if telemetry data sample is matching one or more of known patterns. Detection of patterns is one of very important aspects of behavioral analysis. For example: is your memory utilization is spiking than sharply dropped. This could be a?potentially indication that some processes were killed due to a memory overconsumption. There are number of ways and methods to detect a patterns in data. Tried and true method is to apply statistical analysis on known data sample. Also, use of Machine Learning become more and more popular for that task.

Row, Column, Timeline ....

Aggregation is a process of computation over one of the three dimensions of TOM.

  • Column Aggregation, is a computation performed over Telemetry stored in column of TOM. This will bring to the computation, different Telemetry items from the single source. Common use case is to aggregate data from related Telemetry in the single source. For example: compute the total utilization of all data partitions on the host.
  • Row Aggregation, is a computation performed over Telemetry stored in row of TOM. This will bring to the computation same Telemetry items located in different sources. Common use cases is to perform computation for the Telemetry items in the cluster. Example: calculate difference between minimum and maximum load average on the members of cluster.
  • Timeline Aggregation, is a computation performed over Telemetry values of the same Telemetry item, from the same source, but with sampling on the timeline axis of TOM. Common use cases for this type of Aggregation is to perform statistical, ML (and other) analysis of the Telemetry item across the time. Example: Calculate an average memory utilization on host X in the last 60 minutes.

Outcome of computation

Regardless of what you are computing and which computing method you choose for the job, likely you will have a Value fact, Complex fact or a Binary fact as an outcome of your computation. And this fact will be a part of your TOM, having timestamp of the time of computation performed. Since, the outcome of computation will be the part of TOM, other Computations could use the outcome of other Computations, producing new outcomes, which could be the base for other computation and so on.

Computations over known Facts or other Computation outcomes are the heart and a soul of telemetry analysis and observation. If in your observation you are rely only on data collected from your infrastructure or business processes, your observations are shallow and likely, you will not have a deeper understanding of the processes that you are trying to observe. Your conclusions are always as good, as the data you have.

What is your relations ?

The third Telemetry type is Relation. Relation is a process of establishing a relationship that exists between two Facts. Facts linked to Relation, could be a Binary Facts, Value Facts or Complex Facts. What is the use cases for a Relation telemetry type ? Every time, that you are looking to analyze some fine-tuned facts, some of this analysis will be difficult if possible at all without establishing relationship between recorded facts. Example: Number of requests from user X to application Y is Z.?

Conclusion

In this very short article, I've made an attempt to bring together my thoughts about various types of telemetry data that could be used for exchanging, storing and analysis. This?data is generated by some infrastructure and/or business processes and usually do have a static typing. And while I am?discounting that I can miss something, but at the same time, I am hoping that I covered most important types of telemetry data.

要查看或添加评论,请登录

Vladimir Ulogov的更多文章

  • Listen to a silence

    Listen to a silence

    Monitoring and observability start with collecting the various telemetry and other related data. Then, various…

    3 条评论
  • In the search for unknown in the data

    In the search for unknown in the data

    What are we looking for? In the business of observability, we are trying to comprehend the processes happening on the…

  • Open your horizon!

    Open your horizon!

    Life before “golden signals.” There was a time when there was no Google.

  • Establishing causality

    Establishing causality

    Establishing causality is one of the most essential tasks of Monitoring and Observability. Let's show what cause is…

  • A Zen of monitoring

    A Zen of monitoring

    This article is not a tutorial, but a philosophical reflection on the question that many professionals involved in…

    3 条评论
  • There is no spoon

    There is no spoon

    Every joke is usually deeply rooted in the necessity for human beings to tell some facts and a story as individual…

  • Few pointers on how to survive a job hunt.

    Few pointers on how to survive a job hunt.

    Now, when number of people actually looking for a new place of employment and some job-seeking activity on the rise…

  • Integrating Zabbix into your enterprise for fun and profit. DNS integraton.

    Integrating Zabbix into your enterprise for fun and profit. DNS integraton.

    1. Why ? There are lot of ways of how you can manage you company, home or corporate DNS zones.

  • How I Learned to Stop Worrying and Love the Zabbix. (Part 3 and the last one)

    How I Learned to Stop Worrying and Love the Zabbix. (Part 3 and the last one)

    Here is a Part 1 and a Part 2. So, you are the systems manager or administrator or whatever title you've got.

  • How I Learned to Stop Worrying and Love the Zabbix. (Part 2)

    How I Learned to Stop Worrying and Love the Zabbix. (Part 2)

    In the Part 1 of my path towards choice f the monitoring platform, which could satisfy my requirements, first, I have…

社区洞察

其他会员也浏览了