Introduction to Monitoring the Container Environment with Prometheus and Grafana
Joshua Ashton
Revolutionising Cloud Strategies | DevOps Expert & Thought Leader | Founder @ Symposium IT | AWS Partner Prospecting League Champions 2023
Monitoring a container environment is important for a number of reasons:
Essentially, monitoring your container environment is important to ensure the reliability, performance, and security of your applications, as well as to identify and resolve any issues before they impact your users.
This is where Prometheus and Grafana come in…..
Prometheus is a popular open-source monitoring and alerting system. It is primarily used to monitor the performance of various services and systems in a distributed environment.
Prometheus uses a pull-based model, where the Prometheus server periodically scrapes metrics from the monitored services and stores them in a time-series database.
This allows for easy querying and alerting based on the collected metrics.
One of the key features of Prometheus is its powerful query language, PromQL, which allows for powerful querying and analysis of the collected metrics. This can be used to create custom dashboards and alerts for specific use cases.
Prometheus also has a built-in alerting system, which can be configured to send
notifications based on specific conditions of the collected metrics. This can
be used to alert on things like high CPU usage or slow response times.
Prometheus is highly extensible and has a large ecosystem of exporters and integrations that can be used to monitor a wide variety of systems and services, including Kubernetes, Docker, amongst others.
In essence, Prometheus is a powerful and flexible monitoring and alerting tool that is well suited for use in distributed systems and cloud environments.
So what about visualisation and dashboards……
Prometheus dashboards are typically created using Grafana, a popular open-source visualization and dashboard tool.?
Grafana is an open-source visualization and monitoring tool that allows you to create and?share dashboards and alerts. It is often used in conjunction with time-series databases such as Prometheus, InfluxDB, and Elasticsearch to display and analyse metrics and log data.
Grafana provides a web-based interface that allows you to create custom dashboards with a variety of visualisations, including line charts, bar charts, and heat maps.
You can also create alerts that notify you when specific conditions are met, such as when a metric exceeds a certain threshold.
Grafana supports a wide range of data sources, including Prometheus, InfluxDB, Elasticsearch, Graphite, amongst others. This allows you to easily connect to and visualize data from multiple sources in a single dashboard.
Grafana also includes the ability to run a powerful query editor (as mentioned above), which allows you to write complex queries using PromQL (Prometheus Query Language) and other query languages to extract data from your data sources.
It's also worth noting that Grafana has a huge community and is widely used in industry. It has a wide range of plugins, dashboards and alerting options that can be easily integrated into different environments.
So, show me how to get this spun-up.....
Below is an example of how to create a simple dashboard in Grafana to display some basic metrics from a Prometheus server:
avg(kube_pod_container_resource_limits_cpu_cores)
领英推荐
Another example of a PromQL query to get the number of HTTP requests received by an service exposed on port 8080:
sum(rate(http_requests_total{job="my-service", code="~2.."}[1m]))
You can also use built-in Prometheus metrics to monitor the health and performance of the Prometheus server itself, such as the number of scrapes, the number of samples ingested, and the memory usage of the server.
prometheus_scrape_samples_scrape
prometheus_scrape_duration_seconds
prometheus_local_storage_memory_chunks
It's worth noting that Prometheus and Grafana are highly flexible and can be used to monitor and visualize a wide variety of metrics and systems. The examples above should give you a good starting point for creating your own dashboards and metrics, but you should explore other options as well.
So, what about if I have, for some reason, multiple Promethus instances and wish to query across those??
This is where Thanos comes in....
Thanos is an open-source project that provides a set of components for extending the functionality of Prometheus. It is designed to be used in large-scale, highly-available Prometheus deployments, where the storage and querying of metrics can become a bottleneck.
The main feature of Thanos is its ability to provide a global query view across multiple Prometheus instances. This allows you to aggregate metrics from multiple Prometheus servers into a single, unified view, making it easier to monitor and troubleshoot large, distributed systems.
Thanos also provides a number of other features, including:
Thanos is designed to be used in conjunction with Prometheus, and is typically deployed alongside a Prometheus server. It can also be integrated with other monitoring and visualization tools, such as Grafana, to provide a unified view of metrics across multiple systems.
How do I configure Thanos to work with my distributed Prometheus environment, and what does the architecture look like, that is for another day…
“The Grafana Labs Marks are trademarks of Grafana Labs, and are used with Grafana Labs’ permission. We are not affiliated with, endorsed or sponsored by Grafana Labs or its affiliates.”
"Linux? is the registered trademark of Linus Torvalds in the U.S. and other countries."