登录查看更多内容

Performance Benchmarking Microsoft’s Cobalt 100 VMs on Databricks

Rahul Soni

Senior Solutions Architect at Databricks

发布日期: 2024年11月4日

In the ever-evolving landscape of cloud computing, performance is king. As data scientists and engineers, we constantly seek ways to optimize workflows and extract maximum value from our infrastructure.

In this post, we take a deep dive into the performance comparison of Cobalt 100 and x86 VMs on Databricks using TPC-DS datasets. We aim to understand how these VM types — Cobalt 100 (Standard_d4pds_v6) and x86 (Standard_d4ds_v5) — perform under different data loads. We conducted the benchmarks with both 10GB and 100GB TPC-DS datasets.

The Experiment

VM Types

I picked VMs of the same size for both categories. They both have 4 CPU Cores and 16 GB of memory each Cobalt 100 VM type — Standard_d4pds_v6 x86 VM type — Standard_d4ds_v5

Cluster Configurations

Workers — 2
Driver — 1
Databricks Runtime version — 15.4 LTS
Photon disabled
Autoscaling disabled

See the screenshots of both the cluster setups below.

Benchmarking Methodology

We employed the TPC-DS benchmark dataset1, a standard for evaluating performance in data processing systems. The tests were conducted on both 10GB and 100GB datasets. Each query set was executed five times per VM type, with the median runtime selected to mitigate any anomalies or edge cases.

Benchmark Results for the 100GB Dataset

The benchmark consisted of ~120 different queries2 on both the Cobalt100 and x86-based machines. Below are some key results showing the median runtimes for each machine across various queries:

The results from the 100GB dataset highlight significant performance differences between the two VM types. Let’s look at the details.

Overall performance comparison

For the 100GB TPCDS dataset, Cobalt 100 VMs were ~18% faster.

Below are the runtimes for each VM type Cobalt 100: 3,644 seconds x86: 4,450 seconds

Here’s some detailed analysis.

93 out of 118 queries ran faster on the Cobalt 100 VMs than on the x86 machines.
25 queries ran faster on the x86 machines than the Cobalt 100 VMs.

Queries performance distribution(100 GB dataset)

This translates into ~79% of queries performing better on Cobalt 100 VMs.
Here are the top gainers on Cobalt 100 VMs.

Top performing queries on Cobalt 100 (100 GB dataset)

Top performance improvements (100 GB dataset)

Performance improvement distribution (100 GB dataset)

I analyzed the top 3 queries, and they have complex analytical operations in the form of CTE, windowing operations, and multiple sub-queries, which are generally memory-intensive.
On the other end of the spectrum, the top 5 queries fared the best on the x86 machines below.

Top-performing queries on x86 (100 GB dataset)

Benchmark Results for the 10GB Dataset

The performance trends observed in the smaller dataset align with the 100GB dataset benchmark results. The performance delta is a bit narrower, though. Details below.

Overall Performance: Similar to the larger dataset, Cobalt 100 VMs generally outperformed x86 VMs.
Cobalt 100 VMs are ~13% faster compared to the x86 machines. See the screenshot below.

Overall performance comparison for the 10GB dataset

87 out of 118 queries ran faster on Cobalt VMs. The number was 93 for the 100GB dataset.
Here are the top queries on Cobalt 100 VMs.

Top performing queries on Cobalt 100 (10 GB dataset)

Some outliers ran faster on Cobalt 100 during the TPCDS 100 GB dataset benchmark but flipped during the 10GB benchmark. See below.

However, the runtimes for queries are very close to the 10GB benchmark.

Key Takeaways

Overall Performance Boost: Cobalt 100 VMs demonstrated a significant advantage in total runtime, potentially offering substantial efficiency gains for large-scale data processing tasks.
Query-Dependent Benefits: The performance improvements varied widely across different queries, indicating that the benefits of Cobalt 100 may be more pronounced for certain types of operations or data patterns.
Optimization Opportunities: The mixed results suggest that there may be opportunities for query optimization to take full advantage of Cobalt 100’s architecture.

Conclusion

Though it’s still early days for the Cobalt 100 VMs, in our initial benchmarking, Cobalt 100 consistently outperformed x86 in both 10GB and 100GB TPC-DS datasets, especially in larger datasets and I/O-intensive queries.

As organizations look to optimize performance in the cloud, Cobalt 100 VMs offer a compelling alternative for Databricks workloads. Their modern architecture provides not only performance benefits but also potentially better cost-efficiency.

As with any technology decision, it’s crucial to conduct your own benchmarks with representative workloads to determine the best fit for your organization’s needs. The promising results seen here suggest that Cobalt 100 VMs could be a game-changer for many data-intensive applications, offering a new level of performance for the most demanding analytics tasks.

Have you experimented with Cobalt 100 VMs in your workflows? I’d love to hear about your experiences and insights in the comments below!

要查看或添加评论，请登录

Rahul Soni的更多文章

Delta Lake Deletion Vectors. Solution to the immutability challenges?

2024年11月18日

Delta Lake Deletion Vectors. Solution to the immutability challenges?

The advent of data lakes revolutionized the way we handle vast and unstructured data, offering unparalleled storage…

2 条评论
Benchmarking Photon Performance with TPC-DS Dataset on Databricks

2024年11月11日

Benchmarking Photon Performance with TPC-DS Dataset on Databricks

The Case for Photon As data teams seek ways to optimize processing speed and reduce costs, Databricks introduced Photon…

5 条评论
Unlocking Unified Data Governance with Microsoft Purview and Databricks Unity Catalog

2024年11月6日

Unlocking Unified Data Governance with Microsoft Purview and Databricks Unity Catalog

In today’s data-driven world, effective data governance is crucial for organizations to manage and secure their data…

8 条评论
Row Level Filtering and Column Level Masking in Databricks

2024年10月30日

Row Level Filtering and Column Level Masking in Databricks

In today’s data-driven landscape, safeguarding sensitive information is a top priority for organizations across…

3 条评论
Delta Lake Liquid Clustering

2023年10月20日

Delta Lake Liquid Clustering

Have you ever wondered if there’s a dynamic solution to the relentless challenge of data partitioning in the world of…

2 条评论
Interrupt a running thread from NiFi UI

2019年5月1日

Interrupt a running thread from NiFi UI

In this article, I will discuss one of the most exciting new features coming with NiFi 1.7.
Trigger-based/Serial Data processing in NiFi using Wait and Notify Processor

2019年4月30日

Trigger-based/Serial Data processing in NiFi using Wait and Notify Processor

In this article, I’m going to cover a simple solution to control the data processing in NiFi serially or based on an…

See all articles

The Experiment

VM Types

Cluster Configurations

Benchmarking Methodology

Benchmark Results for the 100GB Dataset

Overall performance comparison

Benchmark Results for the 10GB Dataset

Key Takeaways

Conclusion

Rahul Soni的更多文章

Delta Lake Deletion Vectors. Solution to the immutability challenges?

Benchmarking Photon Performance with TPC-DS Dataset on Databricks

Unlocking Unified Data Governance with Microsoft Purview and Databricks Unity Catalog

Row Level Filtering and Column Level Masking in Databricks

Delta Lake Liquid Clustering

Interrupt a running thread from NiFi UI

Trigger-based/Serial Data processing in NiFi using Wait and Notify Processor