登录查看更多内容

What’s the difference between observability and monitoring?

NordVPN

Cutting-edge security for your data traffic, complete privacy online, and internet with no borders with NordVPN.

发布日期: 2023年1月24日

+ 关注

Monitoring vs Observability

“Monitoring” and “observability” are often used interchangeably, but these concepts have a few fundamental differences.

Monitoring?is the process of using telemetry data to understand the health and performance of your application. Monitoring telemetry data is preconfigured, implying that the user has detailed information on their system’s possible failure scenarios and wants to detect them as soon as they happen.

In the classical approach to monitoring, we define a set of metrics, collect them from our software system, and react to any changes in the values of these metrics that are of interest to us.

For example:

Excessive CPU usage can indicate that we need to scale it up to compensate increasing system loads.
A drop in successfully served requests after a fresh release can indicate that the newly released version of the API is malfunctioning.
Health checks process binary metrics that represent whether the system is alive at all or not.

Observability?extends this approach. Observability is the ability to understand the state of the system by performing continuous real time analysis of the data it outputs.

Instead of just collecting and watching predefined metrics, we continuously collect different output signals. The most common types of signals – the three pillars of observability – are:

Metrics: Numeric data aggregates representing software system performance.
Logs: Time stamped messages gathered by by the software system and its components while working;
Traces: Maps of the paths taken by requests as they move through the software system.

The development of complex distributed microservice architectures has led to complex failure scenarios that can be hard or even impossible to predict. Simple monitoring is not enough to catch them. Observability helps by improving our understanding of the internal state of the system.

Metrics

Choosing the right metrics to collect is key to establishing an observability layer for our software system. Here are a few different popular approaches that define a unified framework of must-have metrics in any software system.

USE

Originally described by Brendan Gregg, this approach focuses more on white-box monitoring – monitoring of the infrastructure itself. Here’s the framework:

Utilization?– resource utilization.
% of CPU / RAM / Network I/O being utilized.
Saturation?– how much remaining work hasn’t been processed yet.
CPU run queue length.
Storage wait queue length.
Errors?– errors per second
CPU cache miss.
Storage system fail events.

Note: Defining “saturation” in this approach can be a tricky task and may not be possible in specific cases.

领英推荐

NuNet Technical Update Q2 2024

NuNet 7 个月前

Between predictable and practical - on kubernetes…

groundcover 2 年前

Workload Optimized Elastic Compute Services: A 2025…

Avesha 2 个月前

Four Golden signals

Originally described in the Google SRE Handbook, the Four Golden signals framework is defined as follows:

Latency?– time to process requests.
Traffic?– requests per second.
Errors?– errors per second.
Saturation?– resource utilization.

RED

Originally described by Tom Wilkie, this approach focuses on black-box monitoring – monitoring the microservices themselves. This simplified subset of the Four Golden Signals uses the following framework:

Rate?– requests per second.
Errors?– errors per second.
Duration?– time to process requests.

Choosing and following one of these approaches allows you to unify your monitoring concept throughout the whole system and make it easier to understand what is happening. They complement one another, and your choice may depend on which part of a system we want to monitor. These approaches also don′t exclude additional business-related metrics that vary from one component of the software system to another.

Logs

System logs are a useful source of additional context when investigating what is going on inside a system. They are immutable, time-stamped text records that provide context to your metrics.

Logs should be kept in a unified structured format like JSON. Use additional log storage/visualization tools to simplify interaction with the massive amount of text data the software system provides. One very well-known and popular solution for log storage is ElasticSearch.

Traces

Traces help us better understand the request flow in our system by representing the full path any given request takes through a distributed software system. This is very helpful in identifying failing nodes and bottlenecks.

Traces themselves are hierarchical structures of spans, where each span is a structure representing the request and its context in every node in its path. Most common tracing visualization tools like Jaeger or Grafana display traces as waterfall diagrams showing the parent and child spans caused by the request.

Conclusion

Building an observable software system lets you identify failure scenarios and possible risks during the whole system life cycle. A combination of metrics, extensive log collection and traces helps us understand what’s happening inside our system at any moment and speeds up investigations of abnormal behavior.

This article was just the first step. We’ve covered the standard approaches to metrics and briefly discussed traces and logs. But to implement an observable software system, we need to set up its components correctly to supply us with the signals we need. In part 2, we’ll discuss instrumentation approaches and modern standards in this field.

What’s the difference between observability and monitoring?

NordVPN

Cutting-edge security for your data traffic, complete privacy online, and internet with no borders with NordVPN.

Monitoring vs Observability

Metrics

USE

领英推荐

Four Golden signals

RED

Logs

Traces

Conclusion

NordVPN的更多文章

社区洞察

其他会员也浏览了

How eva. is Changing the Game in Personal Computing

Transforming Your IT Environment with IBM Power10

How to set up and manage a Hyper-V Failover Cluster, Step by step

Storage Industry Expert Chris Mellor Discusses Infinidat's 2022 Success and the Outlook for 2023 with Infinidat CEO Phil Bullinger

A Deep Dive on SMB over QUIC File Sharing

AI Business Core: Next-Gen AI & Edge Processors

Challenges faced while working with Distributed Systems

ZeroMQ: The Asynchronous Messaging Library, Overview & Application in Edge Computing

How Scaleway's Object Storage failed me: files lost, terrible customer service and lack of best practices

Scalable Service-Oriented Middleware over IP - An Introduction

Monitoring vs Observability

Metrics

USE

领英推荐

Four Golden signals

RED

Logs

Traces

Conclusion

NordVPN的更多文章

How to use a VPN to change location

Understanding the cybersecurity terminology: Key terms everyone should know

Does a VPN protect you on public Wi-Fi?

Should you use a VPN for travel?

How to avoid holiday scams

Is NordVPN cheaper during Black Friday?

Dark Web Monitor data: Why are data leaks decreasing?

社区洞察

其他会员也浏览了

How eva. is Changing the Game in Personal Computing

Transforming Your IT Environment with IBM Power10

How to set up and manage a Hyper-V Failover Cluster, Step by step

Storage Industry Expert Chris Mellor Discusses Infinidat's 2022 Success and the Outlook for 2023 with Infinidat CEO Phil Bullinger

A Deep Dive on SMB over QUIC File Sharing

AI Business Core: Next-Gen AI & Edge Processors

Challenges faced while working with Distributed Systems

ZeroMQ: The Asynchronous Messaging Library, Overview & Application in Edge Computing

How Scaleway's Object Storage failed me: files lost, terrible customer service and lack of best practices

Scalable Service-Oriented Middleware over IP - An Introduction