Performance Best Practices for
VMware vSphere, Part 2

Performance Best Practices for VMware vSphere, Part 2

Hardware Storage Considerations

Back-end storage configuration can greatly affect performance. Lower than expected storage performance is most often the result of configuration issues with underlying storage devices rather than anything specific to ESXi.

Storage performance is a vast topic that depends on workload, hardware, vendor, RAID level, cache size, stripe size, and so on. Consult the appropriate documentation from VMware as well as the storage vendor.

Many workloads are very sensitive to the latency of I/O operations. It is therefore important to have storage devices configured correctly. The remainder of this section lists practices and configurations recommended by VMware for optimal storage performance.

  • VMware Storage vMotion performance is heavily dependent on the available storage infrastructure bandwidth. We therefore recommend you consider the information in “VMware vMotion and Storage vMotion” on page 65 when planning a deployment.
  • Consider providing flash devices for the vSphere Flash Infrastructure layer. This layer can be used to store a host swap file (as described in “Memory Overcommit Techniques” on page 26) and for vSphere Flash Read Cache (vFRC) files.

The vSphere Flash Infrastructure layer can be composed of PCIe flash cards or SAS- or SATA-connected SSD drives, with the PCIe flash cards typically performing better than the SSD drives.

  • Consider choosing PCIe flash cards that use the Non-Volatile Memory Express (NVMe) protocol.
  • 4K native (4Kn) and 512B emulation (512e) drives have some advantages and some limitations. In particular, such drives can perform poorly if your workloads don't issue mostly 4K-aligned I/Os.
  • Consider choosing storage hardware that supports vStorage APIs for Storage Awareness (VASA), thus allowing you to use VVols.

Because VVol performance varies significantly between storage hardware vendors, make sure the storage hardware you choose will provide the VVol performance you expect.

  • Consider choosing storage hardware that supports VMware vStorage APIs for Array Integration (VAAI), allowing some operations to be offloaded to the storage hardware instead of being performed in ESXi.

Though the degree of improvement is dependent on the storage hardware, VAAI can improve storage scalability, can reduce storage latency for several types of storage operations, can reduce the ESXi host CPU utilization for storage operations, and can reduce storage network traffic.

On SANs, VAAI offers the following features:

  • Scalable lock management (sometimes called “hardware-assisted locking,” “Atomic Test & Set,” or ATS) replaces the use of SCSI reservations on VMFS volumes when performing metadata updates. This can reduce locking-related overheads, speeding up many administrative tasks as well as increasing I/O performance for thin VMDKs. ATS helps improve the scalability of very large deployments by speeding up provisioning operations such as expansion of thin disks, creation of snapshots, and other tasks.
  • Extended Copy (sometimes called “full copy,” “copy offload,” or XCOPY) allows copy operations to take place completely on the array, rather than having to transfer data to and from the host. This can dramatically speed up operations that rely on cloning, such as Storage vMotion, while also freeing CPU and I/O resources on the host.
  • Block zeroing (sometimes called “Write Same”) speeds up creation of eager-zeroed thick disks and can improve first-time write performance on lazy-zeroed thick disks and on thin disks.
  • Dead space reclamation (using the UNMAP command) allows hosts to convey to storage which blocks are no longer in use. On a LUN that is thin-provisioned on the array side this can allow the storage array hardware to reuse no-longer needed blocks.

NOTE: In this context, “thin provisioned” refers to LUNs on the storage array, as distinct from thin provisioned VMDKs.

On NAS devices, VAAI offers the following features:

1- Hardware-accelerated cloning (sometimes called “Full File Clone,” “Full Copy,” or “Copy Offload”) allows virtual disks to be cloned by the NAS device. This frees resources on the host and can speed up workloads that rely on cloning. (Note that Storage vMotion does not make use of this feature on NAS devices.)

2- Native Snapshot Support (sometimes called “Fast File Clone”) can create virtual machine linked clones or virtual machine snapshots using native snapshot disks instead of VMware redo logs. This feature, which requires virtual machines running on virtual hardware version 9 or later, offloads tasks to the NAS device, thus reducing I/O traffic and resource usage on the ESXi hosts.

NOTE: Initial creation of virtual machine snapshots with NAS native snapshot disks is slower than creating snapshots with VMware redo logs. To reduce this performance impact, we recommend avoiding heavy write I/O loads in a virtual machine while using NAS native snapshots to create a snapshot of that virtual machine. Similarly, creating linked clones using NAS native snapshots can be slightly slower than the same task using redo logs.

3- Reserve Space allows ESXi to fully preallocate space for a virtual disk at the time the virtual disk is created. Thus, in addition to the thin provisioning that non-VAAI NAS devices support, VAAI NAS devices also support lazy-zeroed thick provisioning and eager-zeroed thick provisioning.

4- Extended Statistics provides visibility into space usage on NAS datastores. This is particularly useful for thin-provisioned datastores, because it allows vSphere to display the actual usage of oversubscribed datastores.

  • If you plan to use VMware vSAN, consider the advantages of an all-flash vSAN deployment versus a hybrid deployment.
  • If you plan to use virtual machine encryption consider choosing hardware that supports the AES-NI processor instruction set extensions.
  • Performance design for a storage network must take into account the physical constraints of the network, not logical allocations. Using VLANs or VPNs does not provide a suitable solution to the problem of link oversubscription in shared configurations. VLANs and other virtual partitioning of a network provide a way of logically configuring a network, but don't change the physical capabilities of links and trunks between switches. VLANs and VPNs do, however, allow the use of network Quality of Service (QoS) features that, while not eliminating oversubscription, do provide a way to allocate bandwidth preferentially or proportionally to certain traffic.
  • Make sure that end-to-end Fibre Channel speeds are consistent to help avoid performance problems.
  • Configure maximum queue depth if needed for Fibre Channel HBA cards.
  • Applications or systems that write large amounts of data to storage, such as data acquisition or transaction logging systems, should not share Ethernet links to a storage device with other applications or systems. These types of applications perform best with dedicated connections to storage devices.
  • For iSCSI and NFS, make sure that your network topology does not contain Ethernet bottlenecks, where multiple links are routed through fewer links, potentially resulting in oversubscription and dropped network packets. Any time a number of links transmitting near capacity are switched to a smaller number of links, such oversubscription is a possibility. Recovering from these dropped network packets results in large performance degradation. In addition to time spent determining that data was dropped, the retransmission uses network bandwidth that could otherwise be used for new transactions.
  • Be aware that with software-initiated iSCSI and NFS the network protocol processing takes place on the host system, and thus these might require more CPU resources than other storage options.
  • Local storage performance might be improved with write-back cache. If your local storage has write-back cache installed, make sure it’s enabled and contains a functional battery module. For more information,see VMware KB article 1006602.
  • Make sure storage adapter cards are installed in slots with enough bandwidth to support their expected throughput. Be careful to distinguish between similar-sounding—but potentially incompatible—bus architectures, including PCI, PCI-X, PCI Express (PCIe), and PCIe 3.0 (aka PCIe Gen3), and be sure to note the number of “lanes” for those architectures that can support more than one width. For example, in order to supply their full bandwidth potential, single-port 32Gb/s Fibre Channel HBA cards would need to be installed in at least PCIe Gen2 x8 or PCIe Gen3 x4 slots (either of which is capable of a net maximum of 32Gb/s in each direction) and dual-port 32Gb/s Fibre Channel HBA cards would need to be installed in at least PCIe Gen3 x8 slots (which are capable of a net maximum of 64Gb/s in each direction).
  • These high-performance cards will typically function just as well in slower PCIe slots, but their maximum throughput could be limited by the slots’ available bandwidth. This is most relevant for workloads that make heavy use of large block size I/Os, as this is where these cards tend to develop their highest throughput.

Reference:

Performance Best Practices for VMware vSphere 6.7, July 27, 2018

#virtualization #performance #vmware

要查看或添加评论,请登录

Ali Darzi的更多文章

社区洞察

其他会员也浏览了