Brief Introduction to Observability on AWS
Licensed image: ArtemisDiana/Shutterstock.com

Brief Introduction to Observability on AWS

APM and Observability

Observability is “the extent to which the internal states of a system can be inferred from externally available data” (Gartner). The three pillars of observability data are metrics, logs, and traces. Application Performance Monitoring (APM), a term commonly associated with observability, is “software that enables the observation and analysis of application health, performance, and user experience” (Gartner).

Additional features often associated with APM and observability products and services include the following (in alphabetical order):

  • Advanced Threat Protection (ATP)
  • Endpoint Detection and Response (EDR)
  • Incident Detection Response (IDR)
  • Infrastructure Performance Monitoring (IPM)
  • Network Device Monitoring (NDM)
  • Network Performance Monitoring (NPM)
  • OpenTelemetry (OTel)
  • Operational (or Operations) Intelligence Platform
  • Predictive Monitoring (predictive analytics / predictive modeling)
  • Real User Monitoring (RUM)
  • Security Information and Event Management (SIEM)
  • Security Orchestration, Automation, and Response (SOAR)
  • Synthetic Monitoring (directed monitoring / synthetic testing)
  • Threat Visibility and Risk Scoring
  • Unified Security and Observability Platform
  • User Behavior Analytics (UBA)

Not all features are offered by all vendors. Most vendors tend to specialize in one or more areas. Determining which features are essential to your organization before choosing a solution is essential.

AI/ML

Given the growing volume and real-time nature of observability telemetry, many vendors have started incorporating AI and ML into their products and services to improve correlation, anomaly detection, and mitigation capabilities. Understand how these features can reduce operational burden, improve insights, and simplify complexity.

Decision Factors

APM and observability tooling choices often come down to a “Build vs. Buy” decision for organizations. In the Cloud, this usually means integrating several individual purpose-built products and services, self-managed open-source projects, or investing in an end-to-end APM or unified observability platform. Other decision factors include the need for solutions to support:

  • Hybrid cloud environments (on-premises/Cloud)
  • Multi-cloud environments (Public Cloud, SaaS, Supercloud)
  • Specialized workloads (e.g., Mainframes, HPC, VMware, SAP, SAS)
  • Compliant workloads (e.g., PCI DSS, PII, GDPR, FedRAMP)
  • Edge Computing and IoT/IIoT
  • AI, ML, and Data Analytics monitoring (AIOps, MLOps, DataOps)
  • SaaS observability (SaaS providers who offer monitoring to their end-users as part of their service offering)
  • Custom log formats and protocols

Finally, the 5 V’s of big data: Velocity, Volume, Value, Variety, and Veracity, also influence the choice of APM and observability tooling. The real-time nature of the observability data, the sheer volume of the data, the source and type of data, and the sensitivity of the data, will all guide tooling choices based on features and cost.

Organizations can choose fully-managed native AWS services, AWS Partner products and services, often SaaS, self-managed open-source observability tooling, or a combination of options. Many AWS and Partner products and services are commercial versions of popular open-source software (aka COSS).

AWS Options

Data Collection, Processing, and Forwarding

  • AWS Distro for OpenTelemetry (ADOT): Open-source APIs, libraries, and agents to collect distributed traces and metrics for application monitoring. ADOT is an open-source distribution of OpenTelemetry, a “high-quality, ubiquitous, and portable telemetry [solution] to enable effective observability.
  • AWS for Fluent Bit: Fluent Bit image with plugins for both CloudWatch Logs and Kinesis Data Firehose. Fluent Bit is an open-source, “super fast, lightweight, and highly scalable logging and metrics processor and forwarder.

Security-focused Monitoring

  • AWS CloudTrail: Helps enable operational and risk auditing, governance, and compliance in your AWS environment. CloudTrail records events, including actions taken in the AWS Management Console, AWS Command Line Interface (CLI), AWS SDKs, and APIs.

Partner Options

According to the 2022 Gartner? Magic Quadrant? - APM & Observability Report, leading vendors commonly used by AWS customers include the following (in alphabetical order):

Open Source Options

There are countless open-source observability projects to choose from, including the following (in alphabetical order):

  • Elastic: Elastic Stack: Elasticsearch, Kibana, Beats, and Logstash
  • Fluentd: Data collector for unified logging layer
  • Grafana: Platform for monitoring and observability
  • Jaeger: End-to-end distributed tracing
  • Kiali: Configure, visualize, validate, and troubleshoot Istio Service Mesh
  • Loki: Like Prometheus, but for logs
  • OpenSearch: Scalable, flexible, and extensible software suite for search, analytics, and observability applications
  • OpenTelemetry (OTel): Collection of tools, APIs, and SDKs used to instrument, generate, collect, and export telemetry data
  • Prometheus: Monitoring system and time series database
  • Zabbix: Single pane of glass view of your whole IT infrastructure stack


This blog represents my viewpoints and not those of my employer, Amazon Web Services (AWS). All product names, logos, and brands are the property of their respective owners.

Pranay Prateek

Co-Founder at SigNoz | The future of Observability is Open Source | Hiring Product designers - write to [email protected] or check out signoz.io/careers | Y Combinator W21

1 年

hey Gary Stafford Would love SigNoz to be mentioned in the open source alternatives as well if possible. Here's our repo - https://github.com/signoz/signoz

Michael Hausenblas

Intelligent operations @ AWS | opinions -: own | 塞翁失马

2 年

Nice! Added it to next week's o11y.news! Minor nit, it's OTel short for OpenTelemetry (and not OTEL) and our distro is ADOT ;)

Gregory Christopher Lindsey

AWS Solutions Architect | Data Engineer

2 年

Great article and good reference

要查看或添加评论,请登录

Gary Stafford的更多文章

社区洞察

其他会员也浏览了