登录查看更多内容

Day 20: Recap – Creating a Basic Observability Stack

Alex Parra

AWS Community Builder | Platform Engineer | Kubernetes | Gitops | DEVOPS | SRE

发布日期: 2025年2月3日

Welcome to Day 20 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’re doing a recap of observability best practices and assembling a basic observability stack using Prometheus, Grafana, Loki, and OpenTelemetry.

What Is an Observability Stack?

An observability stack consists of tools that help monitor, log, and trace applications to ensure performance and reliability.

Key Components of Observability:

Metrics → Measure system health (CPU, memory, request rates).
Logs → Capture detailed event information for debugging.
Traces → Provide insights into request flows across services.

A full observability stack includes:

Prometheus → Metrics collection
Grafana → Visualization and dashboards
Loki → Centralized logging
OpenTelemetry → Distributed tracing

Building a Cloud-Native Observability Stack

Metrics and Prometheus

Collects time-series data from kubernetes, applications, and infrastructure.
Uses PromQL to query and aggregate data.

1. Install Prometheus with Helm:

helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

2. Example PromQL Queries:

sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)

Dashboards and Grafana

Provides a graphical interface for creating and managing dashboards.
Allows you to visualize metrics and logs.
Integrates with Prometheus and Loki.
Supports alerting for proactive monitoring.

Install Grafana with Helm:

领英推荐

2025 Predictions and Insights You Can’t Miss

WSO2 3 个月前

Happiest Minds Newsletter | Issue 4

Happiest Minds Technologies 6 个月前

The G2X Daily Federal Market Brief | 11-4-2024

G2X - The GovCon Growth Platform 4 个月前

helm install grafana grafana/grafana --namespace monitoring

Import a Pre-Built Dashboard:

Go to Dashboards > Import.
Enter Dashboard ID: 9135 (Kubernetes Cluster Monitoring)
Select Prometheus as the data source
Click Import to visualize your cluster metrics
Click Save to store your dashboard

Logs with Loki

Lightweight, scalable, and highly-available logging solution.
Works seamlessly with grafana for log analysis.

Install Loki with Helm:

helm install loki grafana/loki-stack --namespace monitoring

Query logs in Grafana:

{job="nginx"} |= "error"

Distributed Tracing with OpenTelemetry

Captures end-to-end request flows across microservices.
Helps debug latency issues and optimize performance.

Install OpenTelemetry with Helm:

helm install otel-collector open-telemetry/opentelemetry-collector --namespace monitoring

Why Use an Observability Stack?

Faster troubleshooting and debugging. Quickly identify issues and root causes.
Better Performance Monitoring. Identify bottlenecks and optimize resource utilization.
Improved incident response. Reduce downtime and improve customer satisfaction.

Activity for Today

Review metrics, logs, and traces and how they work together.
Install Prometheus, Grafana, Loki, and OpenTelemetry.
Test queries, dashboards, and alerts in Grafana.

What’s Next?

Tomorrow, we’ll shift focus to Internal Developer Platforms (IDPs) and discuss how platform teams improve developer experience.

要查看或添加评论，请登录

Alex Parra的更多文章

Day 30: Wrapping Up – The Future of Platform Engineering

2025年2月14日

Day 30: Wrapping Up – The Future of Platform Engineering

?? Congratulations! You’ve reached Day 30 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’ll…
Day 29: Preparing for Certifications and Next Steps in Platform Engineering

2025年2月13日

Day 29: Preparing for Certifications and Next Steps in Platform Engineering

Welcome to Day 29 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’ll cover certifications, career…
Day 28: Final Project – Deploying and Demonstrating Your IDP

2025年2月12日

Day 28: Final Project – Deploying and Demonstrating Your IDP

Welcome to Day 28 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’re finalizing and demonstrating a…
Day 27: Putting It All Together – Your First Internal Developer Platform

2025年2月11日

Day 27: Putting It All Together – Your First Internal Developer Platform

Welcome to Day 27 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’re assembling all the concepts…
Day 26: Securing Your Platform with OPA and Trivy

2025年2月10日

Day 26: Securing Your Platform with OPA and Trivy

Welcome to Day 26 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’ll focus on securing Kubernetes…
Day 25: Canary Deployments and Feature Flags – Safe Rollouts

2025年2月9日

Day 25: Canary Deployments and Feature Flags – Safe Rollouts

Welcome to Day 25 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’ll cover canary deployments and…
Day 24: Scaling Applications with Kubernetes HPA and Cluster Autoscaler

2025年2月8日

Day 24: Scaling Applications with Kubernetes HPA and Cluster Autoscaler

Welcome to Day 24 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’re diving into autoscaling in…
Day 23: Managing Secrets with HashiCorp Vault and AWS Secrets Manager

2025年2月8日

Day 23: Managing Secrets with HashiCorp Vault and AWS Secrets Manager

Welcome to Day 23 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’re focusing on secrets management…
Day 22: Building Templates and Catalogs with Backstage

2025年2月5日

Day 22: Building Templates and Catalogs with Backstage

Welcome to Day 22 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’re diving into Backstage, the…
Day 21: Building Developer Portals with Backstage

2025年2月4日

Day 21: Building Developer Portals with Backstage

Welcome to Day 21 of the Zero to Platform Engineer in 30 Days challenge! ?? Today, we’re focusing on Internal Developer…

See all articles

Day 20: Recap – Creating a Basic Observability Stack

Alex Parra

AWS Community Builder | Platform Engineer | Kubernetes | Gitops | DEVOPS | SRE

What Is an Observability Stack?

Key Components of Observability:

A full observability stack includes:

Building a Cloud-Native Observability Stack

Metrics and Prometheus

1. Install Prometheus with Helm:

2. Example PromQL Queries:

Dashboards and Grafana

Install Grafana with Helm:

领英推荐

Import a Pre-Built Dashboard:

Logs with Loki

Install Loki with Helm:

Query logs in Grafana:

Distributed Tracing with OpenTelemetry

Install OpenTelemetry with Helm:

Why Use an Observability Stack?

Activity for Today

What’s Next?

Alex Parra的更多文章

社区洞察

其他会员也浏览了

??GovCon Insights by G2Xchange | 4-25-24

Building a smart(er) nation: Simple models lead to better decisions

Solace Stream: Event-native mindset and real-time visibility

Data Tables Beta release, improved documentation, AI tooling, and more builder workshops!

What is Open Telemetry?

Autocon 1 - Tools and Technology Terms

Use Case: Monitoring Application-Specific Metrics

Decoding the Use of Event-driven Architecture in Various Industries

How Can Government Modernize from a World of Legacy IT Systems?

MuleSoft API – led Connectivity (Part 2)

What Is an Observability Stack?

Key Components of Observability:

A full observability stack includes:

Building a Cloud-Native Observability Stack

Metrics and Prometheus

1. Install Prometheus with Helm:

2. Example PromQL Queries:

Dashboards and Grafana

Install Grafana with Helm:

领英推荐

Import a Pre-Built Dashboard:

Logs with Loki

Install Loki with Helm:

Query logs in Grafana:

Distributed Tracing with OpenTelemetry

Install OpenTelemetry with Helm:

Why Use an Observability Stack?

Activity for Today

What’s Next?

Alex Parra的更多文章

Day 30: Wrapping Up – The Future of Platform Engineering

Day 29: Preparing for Certifications and Next Steps in Platform Engineering

Day 28: Final Project – Deploying and Demonstrating Your IDP

Day 27: Putting It All Together – Your First Internal Developer Platform

Day 26: Securing Your Platform with OPA and Trivy

Day 25: Canary Deployments and Feature Flags – Safe Rollouts

Day 24: Scaling Applications with Kubernetes HPA and Cluster Autoscaler

Day 23: Managing Secrets with HashiCorp Vault and AWS Secrets Manager

Day 22: Building Templates and Catalogs with Backstage

Day 21: Building Developer Portals with Backstage

社区洞察

其他会员也浏览了

??GovCon Insights by G2Xchange | 4-25-24

Building a smart(er) nation: Simple models lead to better decisions

Solace Stream: Event-native mindset and real-time visibility

Data Tables Beta release, improved documentation, AI tooling, and more builder workshops!

What is Open Telemetry?

Autocon 1 - Tools and Technology Terms

Use Case: Monitoring Application-Specific Metrics

Decoding the Use of Event-driven Architecture in Various Industries

How Can Government Modernize from a World of Legacy IT Systems?

MuleSoft API – led Connectivity (Part 2)