Navigating the Metrics: Understanding Host and Container Observations with Linux Tools

Introduction: In my last piece, 'Mastering Modern Tech: Microservices, Virtualization, and Containerization,' I addressed the real benefits of microservices from a system administrator's perspective, debunking some of the hype surrounding them. We saw how containers could tackle complex challenges like scalability and resilience, showcasing their utility beyond mere technological trends. Yet, it's crucial to understand that containers are not a universal solution.

While containers provide advantages such as quicker startup times than VMs, more efficient resource use, and simplified I/O operations, they introduce unique challenges. These include increased contention for kernel resources and a shared kernel environment, which can amplify the impact of issues across the system. Additionally, the inherent nature of containers often obscures deep system insights, which are vital for maintaining robust system health.

This article builds on these concepts to shed light on the use of traditional Linux tools in navigating from the host environment to the guest and into the container. By mapping these processes and examining the distinct observability challenges posed by containers and hypervisors, we aim to bridge the knowledge gap for those unfamiliar with the complexities of virtualized environments. Our goal is to equip you with the knowledge to effectively manage and monitor these layers, thereby enhancing the efficiency and reliability of your systems

For this, I have an existing cluster that I'm going to use, which consists of 3 master nodes and 1 worker node."


in this cluster our hypervisor is KVM, let's dive into how we navigate from the hypervisor level down to the containers. This journey is essential for anyone managing a virtualized environment

Hardware virtualization:


Let's start with top command from the host (hypervisor) as every administrator does ;)

top output

Although KVM isolates guests from each other, making them function like separate servers, you can still gather some information from the host

The `qemu-system-x86` process represents a KVM guest, encompassing a thread for each vCPU and threads for I/O proxies. The total CPU usage for the guest is displayed in the previous top output and individual vCPU usage can be analyzed using pidstat

pidstat -r -u -p <pid> 1        

This output displays the CPU threads, labeled CPU 0/KVM and CPU 1/KVM, which utilize 21% and 19% of the CPU, respectively.

To associate QEMU processes with their corresponding guest instance names, one typically examines the process arguments (using ps -wwfp PID) to identify the -name option.

Additionally, We can analyze guest vCPU exits since it is crucial. The nature of these exits can indicate the activities of a guest—whether a vCPU is idle, engaged in I/O, or performing computational tasks. On Linux, the perf(1) kvm subcommand offers detailed statistics for KVM exits. For instance:


perf list
bpftrace


While observing the internal operations of a guest virtual machine might not be straightforward for an operator, analyzing exit events can reveal the impact of hardware virtualization on performance. A low count of exits, with a high proportion being HLT signals, indicates that the guest CPU is largely idle. Conversely, a high frequency of I/O operations, with numerous interrupts being generated and injected, suggests significant activity over the guest’s virtual network interfaces and disks.

For those delving deeper into KVM’s intricacies, numerous tracepoints are available for detailed analysis. Using the `perf list | grep kvm` command on the host reveals a variety of KVM-related tracepoints, such as kvm_ack_irq, kvm_age_page, and kvm_exit. These tracepoints provide a granular look at KVM operations and are essential for advanced diagnostics.

For instance, by listing arguments for kvm:kvm_exit with `bpftrace -lv t:kvm:kvm_exit`, we can obtain detailed information about exit reasons, the guest’s return instruction pointer, and additional metrics. This, coupled with kvm:kvm_entry, which indicates when a guest is accessed or resumed, allows us to measure the duration and reasons for exits. In "BPF Performance Tools," I'm using, a bpftrace tool that visualizes exit reasons as a histogram, providing a comprehensive view of KVM performance dynamics. This tool is also available as an open source and was created by Branden Gregg for broader access and utilization.



Understanding the Output Format

- @exit_ns[exit_code, exit_reason]: This indicates the histogram is for a specific exit reason, where exit_code is a numeric identifier for the reason, and exit_reason is the description.

- Histogram Buckets: Each line within a section represents a range of durations (in nanoseconds) for which the VM exits of that type fell into. The count and bar graph indicate how many exits fell into each time bucket. I only included three here

IO_INSTRUCTION (exit_code 30):

  • These exits are triggered by I/O instructions, which are typically slower operations. It’s evident that a significant number of these exits take longer, up to 64,000 ns, which is acceptable considering this is being tested on my old laptop

EPT_MISCONFIG (exit_code 49):

  • This indicates potential misconfiguration issues with the Extended Page Table (EPT), leading to excessive exits. Such behavior may point to configuration or compatibility problems in memory management that would require further investigation. However, since this is only a test environment, I will not delve deeper at this time. From my initial assessment, it appears to be a load-related issue

PENDING_INTERRUPT (exit_code 7):

  • A high volume of these suggests that my VM is frequently interrupted by hardware or software interrupts, affecting performance.
  • I'd like to now move up a bit to the VM and see what information I can gather from there, so I can better understand the workload and get a clearer picture of my environment

top

While containers offer significant operational efficiencies and quick deployment, they typically operate under the same host VM's kernel, which exposes all container processes at the VM level, potentially compromising isolation. However, Kubernetes enhances container security through sophisticated orchestration capabilities, including service meshes and secret management. Service meshes manage secure service-to-service communication across the Kubernetes cluster, adding an additional layer of security and network control. Meanwhile, Kubernetes' secret management securely stores and handles sensitive information like passwords and API keys, mitigating risks associated with direct exposure. These features collectively help to address the security and isolation concerns in multi-tenant architectures where data privacy and compliance are critical. Nevertheless, while these tools improve security, the fundamental challenge of shared kernel architecture remains, emphasizing the need for careful configuration and management in environments dealing with sensitive data.

mpstat -P ALL 1

The output of mpstat provides detailed statistics on CPU utilization and performance for individual processors or processor cores, which assists administrators in analyzing and optimizing system behavior. I primarily use it to ensure that I don't have a 'busy neighbor' consuming excessive CPU time, particularly checking that the 'steal' time does not exceed 10% for more than one minute

Since disk and network devices are virtualized, latency is a crucial metric to analyze, as it shows how these devices respond within the constraints of virtualization and the activity of other tenants. Metrics like percent busy are challenging to interpret without detailed knowledge of the underlying devices. Device latency can be thoroughly examined using kernel tracing tools such as BPF tools. For example, let's explore 'biosnoop', which utilizes BPF to provide insights into device latency.

the output shows the virtual disk device latency logging the output of biosnoop can help you to examin the sequence of I/O to see if any latency outliers are present and if they are usually it's a physical contention or device issue


OS virtualization:


OS virtualization in Linux partitions the operating system into container instances, functioning as independent guest servers with capabilities like individual administration and rebooting. These containers provide efficient, rapidly-booting environments ideal for cloud customers and high-density servers for cloud operators. The concept, originating from the Unix chroot command, evolved into more secure solutions like FreeBSD jails and Solaris Zones, incorporating extensive resource controls through Linux’s namespaces and cgroups. While containers offer significant performance benefits such as fast initialization and efficient memory usage, they also introduce challenges like increased kernel resource contention and reduced security due to shared kernel architecture, which affects all containers during a kernel panic or when running different kernel versions.


In this discussion, I aim to compare the functionality of various monitoring and performance tools when used with virtual machines (VMs) and containers. It’s important to be aware that some tools, designed primarily for VMs, might inadvertently present host-level metrics rather than container-specific data. This can lead to misinterpretations unless one is fully aware of the environmental context in which these tools are applied. Accurate understanding and application of these tools are crucial for obtaining reliable insights and effectively managing system operations.

uptime


free


iostat -mdz 1


vmstat -Sm 1

You can see from those images that even within the container, you are observing metrics at the host level and since the traditional tools such as ps top and so on have no support for kubernetes or containerd which make sense because if they do they are going to support every other container platform you can use the tools provided by the container platform for example crictl stats


crictl stats

from the host also you can use the command

systemd-cgtop


systemd-cgtop

mapping the cgroup to the container is not straightforward but the

the easiest way is to use this command

crictl inspect 1b3f3594f6d66 |grep cgroup        


From this output, you can see that systemd-cgtop indicates your container is not utilizing the CPU and is currently using 184 MB of RAM

how to map the process ID from the host to the container process ID


this shows that the process ID number 41290 is 1 on the container which is true


note the matching namespace IDs 4026532814 this confirms that host PID 41290 is 1 mount

  • namespaces can present similar challenges creating a file in the /tmp in the container is not the same as tmp in the host


Apart from /proc files the nsenter command can execute other commands in selected namespaces see this example

nsenter -t 41290 -m -p top        


In the end, understanding the tools and mechanisms that manage processes in hardware and OS virtualization is crucial for comprehensive system analysis. Initially, I often access the container directly to grasp the issue from a client's perspective. However, containers may not provide a complete view due to their inherent limitations. It's essential to know how to navigate from the container back to the host, mapping and correlating observations from both environments to obtain a holistic understanding. While the isolation provided by VMs comes with a performance cost, the lightweight nature of containers can also lead to gaps in visibility and control. This trade-off is precisely why we are witnessing the rise of lightweight VMs. Technologies like Firecracker represent this evolution, offering the security of traditional hardware virtualization but with the efficiency and agility of container environments. By embracing such innovations, we can address the limitations of both older VMs and containers, paving the way for more optimized and adaptable virtualization solutions.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了