Day 48 : Kubernetes Operations - Monitoring #90DaysofDevops
Ayushi Tiwari
Java Software Developer | Certified Microsoft Technology Associate (MTA)
Kubernetes monitoring helps SREs, DevOps, and cluster admins identify performance issues, such as insufficient resources, high CPU usage, and pod failures, across their Kubernetes (K8s) environment.
It simplifies the management of your containerized applications by tracking uptime, cluster resource utilization, and interactions among cluster components.
This article explains Kubernetes monitoring, highlighting essential metrics to collect and best practices, as well as tools you can implement for effective container monitoring.
What is Kubernetes monitoring? What does it involve? Let's start with what you need to monitor in Kubernetes and why.?
Kubernetes can greatly simplify application deployment in containers and across clouds, but it brings its own set of complexities. As Google notes in its Site Reliability Engineering guide, monitoring a very large, complex system has two major challenges: the vast number of components being analyzed, and the need to maintain a "reasonably low maintenance burden" on the engineers in charge. These requirements demand a monitoring system that not only enables alerts for high-level service objectives, but also can inspect individual components.?
To scale an application and provide reliable service, you need insights into how the application behaves when deployed. To monitor application performance in a Kubernetes cluster, it's critical to examine the performance of containers, pods and services, as well as the characteristics of the cluster as a whole. By providing information about an application's resource usage, Kubernetes allows you to gauge application performance to detect and remove bottlenecks.
Components of Kubernetes: What to Monitor
A Kubernetes cluster architecture includes a master node and separate Kubernetes nodes. Master components include:?
etcd?
Stores configuration information, which can be used by each node in the cluster.
API server (kube-apiserver)
Validates and configures data for API objects such as pods, services, replication controllers and more.
Scheduler (kube-scheduler)
Manages workload utilization and the allocation of pods to available nodes.
kube-controller-manager
A daemon responsible for collecting and sending information to the API server.?
cloud-controller-manager
Runs controllers that interact with underlying cloud provider(s).
Kubernotes node components include:?
Kubernetes add-ons
You have many Kubernetes add-on components to choose from, but here are some popular selections.
Kubernetes Monitoring Challenges
Migrating traditional, monolithic applications to Kubernetes can be time-consuming and error-prone. Enterprises are willing to take this risk, though, to achieve greater agility, innovation, cost benefits, scalability and business growth in the cloud. But companies that migrate monolithic applications to microservices lack visibility into the Kubernetes environment. This makes it impossible to see—in real-time—the interactions of every microservice.
Kubernetes Is Complex
Another reason Kubernetes is difficult to monitor is that a Kubernetes cluster is considerably more complex, with multiple servers and private and public cloud services, notes integration engineer Dave Snyder. When a problem starts, there are many logs and other data and components to investigate. A legacy monolithic environment might require a couple of log searches, but a Kubernetes environment may contain one or more logs for multiple microservices involved in the issue you're investigating.?
Kubernetes Monitoring with APM
Kubernetes monitoring with an application performance monitoring solution gives organizations visibility into application and business performance, including deeper insights into containerized applications, Kubernetes clusters, Docker containers, and underlying infrastructure metrics.?
This visibility allows enterprises to enhance container-level metrics and gain visibility into CPU, packet, memory and network utilization. Organizations then can baseline these metrics and associated health rules, as well as resource usage statistics on their APM-monitored container applications. By comparing APM metrics with the underlying container and server metrics, companies quickly gain insight into the performance of their containerized applications, and learn of potential impediments in the infrastructure stack. Specific metrics, for instance, can help identify both bandwidth-hogging applications and container-level network errors.?
Full-Stack Visibility in Kubernetes Environments
Visibility allows organizations to monitor containerized applications running inside Kubernetes pods, and identify container issues that hamper application performance.
A comprehensive Kubernetes monitoring solution provides end-to-end visibility into every component of the organization's? applications — infrastructure, Kubernetes platform, containers, and every microservice and end user device.
领英推荐
What are Kubernetes monitoring best practices?
Kubernetes presents operational workflows and complexities, many of which involve application performance monitoring. As you expand the use of Kubernetes into your production environments, these challenges become even more significant.?
By creating levels of abstractions such as pods and services, Kubernetes frees you from worrying about where your applications are running, or if they have sufficient resources to run efficiently. But to ensure optimal performance, you still must monitor your applications, the containers that run them, and even Kubernetes itself.?
Here are some important Kubernetes monitoring best practices:?
Use Kubernetes DaemonSets
When running Kubernetes, you might want to run a single pod on all your nodes, such as when running a monitoring process like the AppDynamics agent or Fluentd, an open-source data collector, to collect logs. (The AppDynamics Standalone Machine Agent, which monitors containerized applications running inside Kubernetes pods, is deployed as a daemonset in every node in a Kubernetes cluster.)
A daemonset is a Kubernetes workload object that ensures a particular pod runs on every node in the cluster, or on some subset of nodes. By using a daemonset, you're telling Kubernetes to make sure there is one instance of the pod on every node.
Tags and Labels Matter—A Lot
With Kubernetes managing container orchestration, labels become critically important for monitoring because they are the only way you have to interact with pods and containers. To make your metrics as useful as possible, it's essential to define your labels with a logical, consistent and coherent schema.
Know Which Metrics to Monitor
According to Kubernetes.io , several key types of Kubernetes metrics should be tracked closely:
Use Service Discovery
Since Kubernetes schedules applications dynamically based on scheduling policy, you may not know where your apps are running—but you'll have to monitor them anyway. You'll want to use a monitoring system with service discovery, which automatically adapts metric collection to moving containers. This approach allows you to continuously monitor your applications without interruption.
Kubernetes Monitoring Tools
Kubernetes offers many benefits but adds complexity as well. For example, its ability to distribute containerized applications across multiple data centers—and even different cloud providers—requires a comprehensive monitoring solution to collect and aggregate metrics across many different sources.
Continuous monitoring of system and application health is essential, and many free and commercial solutions provide real-time monitoring of Kubernetes clusters and the applications they host. Here are several open source tools for Kubernetes monitoring:
Prometheus
This popular monitoring and alerting tool for Kubernetes and Docker provides detailed, actionable metrics and analysis. Developed by SoundCloud and donated to the CNCF community, Prometheus is designed specifically to monitor applications and microservices running in containers at scale. Prometheus is not a dashboard, however, and often is used in conjunction with Grafana (see below) to visualize data.?
Grafana
Grafana, an open source platform for analytics and metric visualization, includes four dashboards: Cluster, Node, Pod/Container and Deployment. Kubernetes admins often install Grafana and leverage the Prometheus data source to create information-rich dashboards.
Jaeger
Jaeger is a tracing system used to troubleshoot and monitor transactions in complex distributed systems. It addresses software issues that arise in distributed context propagation, distributed transactions monitoring, latency optimization and more.??
Dashboard
Kubernetes Dashboard, a web UI add-on for Kubernetes clusters, allows you to monitor the health status of workloads. (For more details on Dashboard, see the "Kubernetes add ons" section above.)
Kubewatch
This add-on monitors changes that occur in the Kubernetes pod, and sends notifications to a Slack channel. Written in Golang, Kubewatch uses a Kubernetes client library to interact with the Kubernetes API server, and a Slack client library to interact with Slack.
Weave Scope
A visualization and monitoring tool for Kubernetes and Docker, Weave Scope offers a top-down view of your app and entire infrastructure. Developed by Weaveworks, Weave Scope generates a map of processes, containers and hosts in a Kubernetes cluster. Its graphical UI also allows you to manage and run diagnostic commands on containers.?
EFK Stack?
The EFK Stack is really a melange of three tools that work well together: Elasticsearch, Fluentd and Kibana. Fluentd is a data collector that culls logs from pods running on Kubernetes cluster nodes. It routes these logs to the Elasticsearch search engine, which ingests the data and stores it in a central repository. Kibana,? a data visualization plugin for Elasticsearch, is the UI for the EFK Stack, allowing the user to visualize collected logs and metrics and create custom dashboards.?
InfluxDB
InfluxData’s InfluxDB is a performant store for time series data. Built for very high volume storage of monitoring records, it provides horizontal scalability and high availability with clustering. InfluxDB is a good solution for long-term storage of Kubernetes monitoring data for historic records or modeling.