Selecting the right size Azure VM for SQL workloads

Selecting the right size Azure VM for SQL workloads

Azure VMs use these naming conventions to denote varying features and specifications.

Naming convention explanation

[Family]?+?[Sub-family]* +?[# of vCPUs]?+?[Constrained vCPUs]* +?[Additive Features]?+?[Accelerator Type]* +?[Version]

Value	                                 Explanation
Family	                     Indicates the VM Family Series
*Subfamily	             Used for specialized VM differentiations only
# of vCPUs	             Denotes the number of vCPUs of the VM
*Constrained                 vCPU Used for certain VM sizes only. Denotes           
                             the number of vCPUs for the constrained vCPU 
                             capable size      
Additive Features	Lower case letters denote additive features, such as:
a = AMD-based processor
b = Block Storage performance
d = diskful (that is, a local temp disk is present); this feature is for newer Azure VMs, see Ddv4 and Ddsv4-series
i = isolated size
l = low memory; a lower amount of memory than the memory intensive size
m = memory intensive; the most amount of memory in a particular size
p = ARM Cpu
t = tiny memory; the smallest amount of memory in a particular size
s = Premium Storage capable, including possible use of Ultra SSD (Note: some newer sizes without the attribute of s can still support Premium Storage, such as M128, M64, etc.)
C = Confidential
NP = node packing
*Accelerator Type	Denotes the type of hardware accelerator in the specialized/GPU SKUs. Only the new specialized/GPU SKUs launched from Q3 2020 have the hardware accelerator in the name.
Version	Denotes the version of the VM Family Series        



Azure VM Size Considerations

?Understand your workload

? CPU utilization

? Memory utilization

?? Storage space requirement

? Disk usage metrics

  • ? I/O
  • Throughput*

? Size for what you need and use

?? You’re essentially renting, scale when you grow


Azure VM Size Considerations

  • ?Azure VMs have limits on
  • Number of cores
  • Amount of memory
  • ?Ephemeral disk support and size
  • Number of disk
  • I/O and throughput
  • Network bandwidth

Checklist

Review the following checklist for a brief overview of the VM size best practices that the rest of the article covers in greater detail:

  • The new?Ebdsv5-series?provides the highest I/O throughput-to-vCore ratio in Azure along with a memory-to-vCore ratio of 8. This series offers the best price-performance for SQL Server workloads on Azure VMs. Consider this series first for most SQL Server workloads.
  • Use VM sizes with 4 or more vCPUs like the?E4ds_v5?or higher.
  • Use?memory optimized?virtual machine sizes for the best performance of SQL Server workloads.
  • The?Edsv5?series, the?M-, and the?Mv2-?series offer the optimal memory-to-vCore ratio required for OLTP workloads.
  • The M series VMs offer the highest memory-to-vCore ratio in Azure. Consider these VMs for mission critical and data warehouse workloads.
  • Use Azure Marketplace images to deploy your SQL Server Virtual Machines as the SQL Server settings and storage options are configured for optimal performance.
  • Collect the target workload's performance characteristics and use them to determine the appropriate VM size for your business.
  • Use the?Data Migration Assistant?and?SKU recommendation?tools to find the right VM size for your existing SQL Server workload.
  • Use?Azure Data Studio?to migrate to Azure.

?

How does disk performance work?

Azure virtual machines have input/output operations per second (IOPS) and throughput performance limits based on the virtual machine type and size. OS disks and data disks can be attached to virtual machines. The disks have their own IOPS and throughput limits.

Your application's performance gets capped when it requests more IOPS or throughput than what is allotted for the virtual machines or attached disks. When capped, the application experiences suboptimal performance. This can lead to negative consequences like increased latency. Let's run through a couple of examples to clarify this concept. To make these examples easy to follow, we'll only look at IOPS. But, the same logic applies to throughput.

Disk IO capping

Setup:

·?????? Standard_D8s_v3

o?? Uncached IOPS: 12,800

·?????? E30 OS disk

o?? IOPS: 500

·?????? Two E30 data disks × 2

o?? IOPS: 500



The application running on the virtual machine makes a request that requires 10,000 IOPS to the virtual machine. All of which are allowed by the VM because the Standard_D8s_v3 virtual machine can execute up to 12,800 IOPS.

The 10,000 IOPS requests are broken down into three different requests to the different disks:

·?????? 1,000 IOPS are requested to the operating system disk.

·?????? 4,500 IOPS are requested to each data disk.

All attached disks are E30 disks and can only handle 500 IOPS. So, they respond back with 500 IOPS each. The application's performance is capped by the attached disks, and it can only process 1,500 IOPS. The application could work at peak performance at 10,000 IOPS if better-performing disks are used, such as Premium SSD P30 disks.

Virtual machine IO capping

Setup:

·?????? Standard_D8s_v3

o?? Uncached IOPS: 12,800

·?????? P30 OS disk

o?? IOPS: 5,000

·?????? Two P30 data disks × 2

o?? IOPS: 5,000

The application running on the virtual machine makes a request that requires 15,000 IOPS. Unfortunately, the Standard_D8s_v3 virtual machine is only provisioned to handle 12,800 IOPS. The application is capped by the virtual machine limits and must allocate the allotted 12,800 IOPS.

Those 12,800 IOPS requested are broken down into three different requests to the different disks:

·?????? 4,267 IOPS are requested to the operating system disk.

·?????? 4,266 IOPS are requested to each data disk.

All attached disks are P30 disks that can handle 5,000 IOPS. So, they respond back with their requested amounts.

Let's explain IOPS (Input/Output Operations Per Second) in a fun and easy way:

?

Imagine your computer is like a chef in a busy restaurant, and IOPS is like how fast the chef can chop ingredients.

?

The Chef's Speed (IOPS):

?

Think of IOPS as the chef's chopping speed. When the chef is preparing dishes, they need to chop vegetables, meat, and other ingredients. The faster they can chop, the quicker they can cook and serve meals to hungry customers.

?

Busy Restaurant (Your Computer):

?

Now, imagine the restaurant is really busy with lots of customers coming in. Each customer represents a task or program running on your computer, like opening an app, saving a file, or loading a webpage.

?

Chopping Efficiency (IOPS Importance):

?

The chef's chopping speed (IOPS) is crucial because if they chop too slowly, customers will have to wait a long time for their food. In computer terms, if your computer has low IOPS, tasks and programs will take a long time to complete because it can't process them quickly.

?

Chopping Superpower (High IOPS):

?

But here's the fun part! If the chef has a superpower that makes them chop incredibly fast, they can handle a lot of orders from customers very quickly. In the same way, a computer with high IOPS can handle many tasks and programs simultaneously without slowing down. It's like having a super chef in the kitchen!

?

So, IOPS is like the superpower of your computer's "chef." The higher the IOPS, the faster your computer can process tasks and serve you, just like a super-speedy chef can serve up delicious meals in no time. ??????

?

Let's dive into IOPS bursting and caching in Azure Data Disks with a fun and easy analogy:

?

Imagine Azure as a Magical Library:

?

Azure is like a magical library where you store your books, and each book is stored on a special shelf called a "Data Disk." These books represent your important data, and you want to access them as quickly as possible.

?

1. IOPS Bursting (Magical Speedy Reading Glasses):

?

IOPS Bursting in Azure Data Disks is like having magical speedy reading glasses for your books. These glasses (IOPS Bursting) allow you to read your books super fast, even faster than your regular reading speed.

?

But here's the fun part: These glasses only work for a limited time! Imagine you can wear them for a short burst of time when you really need to read a book quickly. This burst of speed (extra IOPS) is helpful when you suddenly have a lot of readers (data requests), and you want to serve them quickly.

?

2. Enabling Caching (The Library Assistant):

?

Now, let's talk about enabling caching on your Data Disk, which is like having a helpful library assistant. This assistant makes a copy of the most popular books (data) and keeps them right next to your reading spot (the VM).

?

When you want to read a book you've read recently, the library assistant hands it to you right away because it's right there, without you having to go to the shelf (Data Disk) to get it. This is like caching. It stores frequently used data closer to your computer (VM), so you can access it faster.

?

Think of caching as your library assistant helping you find and read your favorite books (data) without having to go to the shelves (Data Disk) every time. It saves you time and effort!

?

So, in Azure, IOPS Bursting gives you a burst of speed when you need it, like magical speedy reading glasses, and enabling caching is like having a friendly library assistant who keeps your favorite books right next to you for quick access. Together, they make sure you can read and use your data as fast as possible in the magical library of Azure! ?????

?

Setup:

·?????? Standard_D8s_v3

o?? Cached IOPS: 16,000

o?? Uncached IOPS: 12,800

·?????? P30 OS disk

o?? IOPS: 5,000

o?? Host caching:?Read/write

·?????? Two P30 data disks × 2

o?? IOPS: 5,000

o?? Host caching:?Read/write


o?? Host caching:?Read/write

The application uses a Standard_D8s_v3 virtual machine with caching enabled. It makes a request for 16,000 IOPS. The requests are completed as soon as they are read or written to the cache. Writes are then lazily written to the attached Disks.


Conclusion

I understand that IOPS is a unit of measurement that disk manufacturers and storage vendors provide, and that is ok. However, when it comes to testing storage, we tend to focus more on throughput numbers. It is just a math problem, but unless you are in the business of benchmarking storage and doing the calculations from IOPS to throughput based on block size, it can be confusing.

What is troubling to me is the fact that the restriction on throughput is not clear when you select a VM size. The unit of measurement for storage IO is IOPS. At 3,200 IOPS with a 64k block size, I could be around 200 MBps however my VM was limited to 48 MBps. Many IT professionals have discovered they have disk performance issues and scaled their storage to larger and faster disk (more IOPS) expecting better performance, only to find that it didn’t solve their problem. The issue is the size of the VM was limiting their throughput. Scaling up to a higher size VM would solve the issue, but that comes with a cost.

References:

https://learn.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/performance-guidelines-best-practices-checklist?view=azuresql

https://learn.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/performance-guidelines-best-practices-vm-size?view=azuresql

https://learn.microsoft.com/en-us/sql/dma/dma-sku-recommend-sql-db?view=sql-server-ver16

https://learn.microsoft.com/en-us/azure/azure-sql/virtual-machines/windows/pricing-guidance?view=azuresql

https://learn.microsoft.com/en-us/azure/virtual-machines/disk-bursting

Pushkar Sharan

Engineering Manager | Building Cloud Platforms and People at Wayfair

1 年

good one Upendra Kumar

要查看或添加评论,请登录

Upendra Kumar的更多文章

社区洞察

其他会员也浏览了