Monitoring and Observability

Monitoring and Observability

Monitoring is used to offer situational awareness and Observability is a way to get insights into the whole infrastructure. Observability uses data collection to tell you what is wrong and why it happened.

For example, we can actively watch a single metric for changes that indicate a problem and this is monitoring. A system is observable if it emits useful data about its internal state, which is crucial for determining the root.

Logs, Tracing, Metrics : The three pillars of Observability

Logs, Tracing and Metrics - Each of these three components provides essential insight into the system. While metrics can assist in gaining insight into our system’s general health check and performance, traces are used to connect individual log files. Together, these three pillars form the backbone of any observable system.

Logging: We use logging to represent state transformations within an application. When things go wrong, we need logs to establish what change in state caused the error. Logs capture individual events. Logs are granular, timestamped and immutable. When things go wrong, logs are invaluable in determining the root cause. A good practice is to log in structured formats such as JSON. Structured logging enables easy visualization, search and analysis.

Traces: A trace represents a single user’s journey through an entire stack of an application. For example you would use it to establish little used part of a stack or bottlenecks within specific parts of the stack. Traces are sequence of calls triggered by an event. In the world of microservices, a failure can be traced back to an offending microservice or an API call.

Metrics: Metrics are data aggregated over time. Monitoring solutions rely on metrics. Uptime, CPU utilization, system load, memory usage, throughput, response time and error rate are some examples of metrics.


The observability journey starts with instrumentation where data is collected from different micro-services, apis, infrastructure components in the form of telemetry.

OpenTelemetry (OTEL):

It is a standardized, open source framework consisting of tools, APIs and SDKs that simplify the collection of telemetry data. Metrics, logs and traces fall under the category of telemetry data. By removing vendor lock-in and creating available tooling for all, OTEL aims to drive innovation in the observability space. The result is access to a wider set of options for developers to use when analyzing their logs, metrics and traces. This leads to greater ease of use when it comes to implementing observability best practices. It’s an incubating project with the Cloud Native Computing Foundation (CNCF) and resulted from the merger of the OpenCensus and OpenTracing projects.

AWS CloudWatch Monitoring Architecture:

Azure Cloud Monitoring Architecture:

GCP Monitoring Architecture:

References:

[1] https://cloudstudio.com.au/2022/05/14/monitoring-service-aws-azure-gcp-part1/

[2] https://docs.aws.amazon.com/whitepapers/latest/introduction-devops-aws/amazon-cloudwatch.html

[3] https://learn.microsoft.com/en-us/azure/azure-monitor/

[4] https://www.cloudskillsboost.google/course_templates/99

[5] https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/manage/monitor/observability

[6] https://www.atatus.com/blog/observability-vs-monitoring/

[7] https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/manage/monitor/observability

[8] https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/strategy/monitoring-strategy

[9] https://chronosphere.io/learn/an-architectural-view-of-cloud-observability/

[10] https://bravenewgeek.com/microservice-observability-part-2-evolutionary-patterns-for-solving-observability-problems/


Vivek V.

Cloud Architect - Oracle Database@Azure/GCP/AWS | Tech Evangelist | Pre-Sales | ? Lover

1 年

To add to this list Oracle Cloud Infrastructure (OCI) has Observability and Management Platform(O&M) to monitor, analyze, and manage multicloud applications and infrastructure environments with full-stack visibility, prebuilt analytics, and automation. You can read more here https://www.oracle.com/in/manageability/

要查看或添加评论,请登录

Dr. Rabi Prasad Padhy的更多文章

社区洞察

其他会员也浏览了