Monitoring, APM, OpenTelemetry, Observability - modern-day requisites for uninterrupted business operations
A couple of months ago, I was interviewing a few candidates. I had a heartfelt experience when some of them revealed a fundamental misconception that APM, Observability, and Monitoring are just synonymous. Later I realized that some of my DevOps engineer friends also feel diffident to distinguish the same. Nevertheless, I do not consider myself an SME but gained fundamental comprehension on this matter due to my ~3 years of a stint at AppDynamics in a leadership role. That motivates me to write my 35th article on this topic.
If you find it insightful and appreciate my writing, consider following me for updates on future content. I'm committed to sharing my knowledge and contributing to the coding community. Join me in spreading the word and helping others to learn.
Prologue
The success of any business depends on robust system architecture. A system architecture could be either monolithic or distributed or integrated with single/multiple 3rd party systems, or a combination of all. Be it Monitoring, Observability, or anything else, a surveillance system must be in place for better sustainability. Somewhere on the internet, I glanced through an excellent explanation of how Observability is different from Monitoring in just layman's terms. As I can not remember the reference, quoting is not possible, but definitely state the gist I grasped.?
Let us personify the Monitoring as a factory technician who knows the repairing techniques for typical repetitive faults in machines. Such reactive repairs resolve the issue but do not prevent machine downtime. The person can only address the known unknowns.?
Now consider The Observability is a senior technician who keeps an experienced eye on the factory's central control panel to fathom the preemptive warning signs of errors in a machine. Then proactively address the issue to avert any possible breakdowns.
I hope it illustrates a lot!
The DIKW model?
Before we dive into the technical details, we must comprehend why a bunch of data is purposeless if we fail to seek wisdom from it. No matter what or how many datasets you collect, the root-cause problems can never be addressed if they fail to deliver any insight. DIKW model (Data-Information-Knowledge-Wisdom) exemplifies the matter in detail. I would recommend reading through?this article ?first. Also, I'm adding a pictorial guide to get it easily.
The Monitoring??
A monitoring system fetches some predetermined metrics/data from every engineering stack of the system. That dataset is depicted in a dashboard for scrutinization purposes. However, it lacks providing any collective insight or wisdom. This is basically the responsibility of the IT team to analyze the received dataset and take appropriate actions to keep the system functioning. In the case of any complex cloud-native app, it becomes a challenging job for the IT team to monitor everything and make an optimal decision. AppDynamics, Prometheus and Grafana, Datadog, and Dynatrace are a few popular tools available in the market.?
The APM
Application Performance Monitoring (APM) is another type of tool. It solely focuses on overall user experience and application performance. It fetches data like average response time, throughput, network traffic, error rates, predefined business KPIs and SLOs (service level objects), and many more. Then that dataset is depicted in a dashboard to dig through the root cause behind any performance issues. Let me name a few APM tools such as AppDynamics, New Relic, Dynatrace, etc.?
The Observability
Observability can be considered a superset to attain 360-degree control over a system. We often call it full-stack Observability as well. Apart from aiding monitoring support, it provides a thorough insight into how various segments are integrated and forecasts issues methodically analyzed by AI. Observability leverages telemetry data to fetch the current state of the engineering stack. It involves collecting traces, logs, events, and metrics across all applications within that stack. It also facilitates observing any transactions and their performance metrics with granular details from start to end and how those transactions were handled by each stack. Overall, it operates in a reactive way that helps developers with debugging, profiling, dependency analysis, and tracing the issue in the whole system. AppDynamics, Signoz, Dynatrace, Datadog, and Splunk, are a few leading tools currently rocking the Observability market.?
The OpenTelemetry (OTEL)
OpenTelemetry is not a platform but rather an open-source Observability framework. It is an open-source project that collects and translates the telemetry data, including MELT (metrics, events, traces, and logs), into a language-agnostic format. Let me explain the rudimentary concepts of the MELT model below.?Please refer to the architecture screenshots as well.
领英推荐
If you want to cultivate it deeply, read this GitHub page or explore the official documentation .
Please explore?this link ?for more details on the MELT model. Also, I would surely recommend cultivating a fascinating history of OTEL in?this article .?
I can reckon some of the primary advantages of using OpenTelemetry are:
Although it does instrumentation of data but lacks a visualization layer. Either the Engineering team should develop a custom layer or any other popular tool should be integrated to render the exported OTEL dataset.?
AI-enabled Observability, race with Monitoring, and APM
Nowadays, applications are getting complex with many abstraction layers and keeping it distributed to reduce tight coupling among IT infrastructure. Add to that increasing customer demands for a smooth 24x7x365 experience, the need for quick updates via modern CI/CD pipelines, and the continued evolution of The Great Cloud Migration. Such big MELT data makes IT professionals overwhelmed. That's where Observability and AI pitch in together.?
By collecting and analyzing the MELT data, Observability tools empower the DevOps team to at least monitor all these data and regain insight into what is happening in their systems. Integrated AI brings predictability in terms of forecasting the issues based on heuristic MELT data. This is something that traditional Monitoring tools fail to do. When the time comes to look beyond Monitoring and managing this morass of next-gen digital eminence, AI leverages the machine learning-powered advantage to make a difference.
Observability is also leaving the APM behind as it allows teams to quickly find critical issues in their cloud-native, microservices-based apps. Modern microservice architectures increase velocity and scale. Besides that, it also brings painful complexity and unpredictability. Legacy APM tools fail to debug the issue because they were built to examine uncomplicated monolithic applications in predictable environments.?
As AI-enabled Observability brings the ultimate source of truth, many organizations have started adopting it to ease their business operations.
Future of Observability and impact on business operations
There is no stopping as we just embarked on an observability journey. As far as I researched or grasped knowledge from various articles, lectures, or talking to SMEs at APPD, I can jot down a few hypotheses on next-gen Observability opportunities.?
An excellent?AppD blog ?was published regarding the future of Observability. I would recommend skimming through that article once.
Picking the right tool
Despite the plethora of tools available in the market, picking the right tool is essential. However, this is a broader topic for my 36th article, if I may write in the future. AppDynamics is a futuristic, widely used at the enterprise level, investing a lot in open source OTEL framework. If you feel interested, feel free to explore?this link .?
Helping Tech Leaders & Innovators To Achieve Exceptional Results
4 个月Amit this is a very extensive article. Great work. There is something missing that I believe is very important and captured in a post I sent out today. Please take a look and share your thoughts https://www.dhirubhai.net/posts/andrew-mallaband-88b1b7_observability-platformengineering-devops-activity-7219327950637150209-_w2s?utm_source=share&utm_medium=member_ios