Performance Best Practices for VMware vSphere, Part 6
Performance Best Practices for VMware vSphere

Performance Best Practices for VMware vSphere, Part 6

ESXi and Virtual Machines

ESXi CPU Considerations-section 1

This subsection provides guidance regarding CPU considerations in VMware ESXi.

CPU virtualization adds varying amounts of overhead depending on the percentage of the virtual machine’s workload that can be executed on the physical processor as is and the cost of virtualizing the remainder of the workload:

  • For many workloads, CPU virtualization adds only a very small amount of overhead, resulting in performance essentially comparable to native.
  • Many workloads to which CPU virtualization does add overhead are not CPU-bound that is, most of their time is spent waiting for external events such as user interaction, device input, or data retrieval, rather than executing instructions. Because otherwise-unused CPU cycles are available to absorb the virtualization overhead, these workloads will typically have throughput similar to native, but potentially with a slight increase in latency.
  • For a small percentage of workloads, for which CPU virtualization adds overhead and which are CPU-bound, there might be a noticeable degradation in both throughput and latency.

The rest of this subsection lists practices and configurations recommended by VMware for optimal CPU performance.

?? In most environments ESXi allows significant levels of CPU overcommitment (that is, running more vCPUs on a host than the total number of physical processor cores in that host) without impacting virtual machine performance.

If an ESXi host becomes CPU saturated (that is, the virtual machines and other loads on the host demand all the CPU resources the host has), latency-sensitive workloads might not perform well. In this case you might want to reduce the CPU load, for example by powering off some virtual machines or migrating them to a different host (or allowing DRS to migrate them automatically).

  • It is a good idea to periodically monitor the CPU usage of the host. This can be done through the vSphere Web Client or by using esxtop or resxtop. Below we describe how to interpret esxtop data:

- If the load average on the first line of the esxtop CPU panel is equal to or greater than 1, this indicates that the system is overloaded.

-The usage percentage for the physical CPUs on the PCPU line can be another indication of a possibly overloaded condition. In general, 80% usage is a reasonable ceiling and 90% should be a warning that the CPUs are approaching an overloaded condition. However organizations will have varying standards regarding the desired load percentage.

  • Configuring a virtual machine with more virtual CPUs (vCPUs) than its workload can use might cause slightly increased resource usage, potentially impacting performance on very heavily loaded systems. Common examples of this include a single-threaded workload running in a multiple-vCPU virtual machine or a multi-threaded workload in a virtual machine with more vCPUs than the workload can effectively use.

Even if the guest operating system doesn’t use some of its vCPUs, configuring virtual machines with those vCPUs still imposes some small resource requirements on ESXi that translate to real CPU consumption on the host. For example:

- Unused vCPUs still consume timer interrupts in some guest operating systems. (Though this is not true with “tickless timer” kernels)

-Maintaining a consistent memory view among multiple vCPUs can consume additional resources, both in the guest operating system and in ESXi.

  • Most guest operating systems execute an idle loop during periods of inactivity. Within this loop, most of these guest operating systems halt by executing the HLT or MWAIT instructions. Some older guest operating systems (including Windows 2000 (with certain HALs), Solaris 8 and 9, and MS-DOS), however, use busy-waiting within their idle loops. This results in the consumption of resources that might otherwise be available for other uses (other virtual machines, the VMkernel, and so on).

ESXi automatically detects these loops and de-schedules the idle vCPU. Though this reduces the CPU overhead, it can also reduce the performance of some I/O-heavy workloads. For additional information see VMware KB articles 1077 and 2231.

  • The guest operating system’s scheduler might migrate a single-threaded workload amongst multipl vCPUs, thereby losing cache locality.

These resource requirements translate to real CPU consumption on the host.

  • Some workloads can easily be split across multiple virtual machines. In some cases, for the same number of vCPUs, more smaller virtual machines (sometimes called “scaling out”) will provide better performance than fewer larger virtual machines (sometimes called “scaling up”). In other cases the opposite is true, and fewer larger virtual machines will perform better. The variations can be due to a number of factors, including NUMA node sizes, CPU cache locality, and workload implementation details. The best choice can be determined through experimentation using your specific workload in your environment.

Reference:

Performance Best Practices for VMware vSphere 6.7, July 27, 2018

要查看或添加评论,请登录

Ali Darzi的更多文章

社区洞察

其他会员也浏览了