登录查看更多内容

Benchmarks don't lie but liars use benchmarks

Fawad A. Qureshi

Field CTO @ Snowflake | LinkedIn Learning Instructor | Sustainability ??, Data Strategy, Business Transformation

发布日期: 2023年8月14日

"Lies, damned lies, and statistics" is a phrase that describes the persuasive power of statistics in bolstering weak arguments. Another variation of this saying goes: "Statistics don't lie, but liars use statistics."

Assessing and benchmarking different technologies is a common practice across industries. In this discussion, I'll delve into data-centric benchmarks related to data platforms. A former colleague, Marco Ullasci , often remarked, "Benchmarks don't lie, but liars use benchmarks." This points to the tactics data platform vendors employ in their benchmarks to attract new customers.

Let's discuss some of the tactics:

1. Benchmarks are set long before a single line of code is ever written

I have always maintained that the outcome of a benchmark is determined even before the first line of code is penned. You might be wondering how this can be possible. Consider this analogy: Imagine you possess a brand-new Ferrari while I have a bicycle. Now, I challenge you for a race. First of all, you might ask about my sanity. However, I'll guide us to a muddy hiking trail if I can choose the racecourse on the actual race day.

If your objective is to find the best mode of transport for hiking, then this benchmark is relevant. Yet, if that weren't the goal, you'd likely be left perplexed, pondering why we are racing on a hiking trail—beyond the simple fact that I'm resolute in my desire to win.

In a parallel manner, within the domain of data platforms, vendors often try to construct benchmarks that showcase them in a favorable light, even when there's no apparent advantage to the customer.

2. Sponsored Benchmark Studies

We are witnessing a rise in sponsored benchmark studies orchestrated by "independent industry analysts." These studies or papers consistently conclude with the sponsoring company's product emerging as the top contender. Even more troubling is that I have encountered the same research firm publishing various papers on identical topics, each funded by different companies, yet consistently crowning the sponsoring company's product as the victor.

Sometimes, the study is not directly sponsored by the vendor but rather by a system integrator whose core business relies on the vendor being discussed. System Integrators often fail to disclose their relationship and want to come across as an independent consulting organization to the naive reader.

In one recent report, the analysts stated, "We conducted a good faith estimate of the competition." When did benchmarks transition into exercises rooted in faith?

Whenever I come across a sponsored benchmark study, I generally dismiss it. This inclination stems from the fact that approximately 99% of the time, the study promotes the sponsoring company's product. If the survey is not sponsored, I quickly analyze the organization conducting the benchmark to determine if most of its business relies on the vendor in question. The inherent conflict of interest hinders the analysts from conducting an impartial assessment.

3. Law of Large Numbers in Benchmarks

The Law of Large Numbers states that when a significant number of trials are conducted, the average results should closely approach the expected value. This alignment becomes even more precise as the number of trials increases.

For those familiar with basic statistics, it's common knowledge that the probability of obtaining a head on a balanced coin is 50%. However, if I flip the coin four times and achieve three heads, can I conclude that the likelihood of getting a head is 75% for a balanced coin?

The law of large numbers establishes that after experimenting numerous times, the outcomes will increasingly match the anticipated value.

领英推荐

Why Data Collection Services are Essential for Market…

Objectways 9 个月前

Is Data-Driven Decision Making Always Best?

Betty L. 7 年前

How to drive growth through Experiments & First Party…

FT Strategies 2 年前

This principle significantly influences benchmarks. Consider a scenario where you test your queries with only 1TB of data while your production systems manage over 1PB of data. The 1TB sample might fit into the memory of most contemporary data platforms, yielding exceptionally positive results. To counteract potential biases from tiny sample sizes, conducting tests that more closely mirror production conditions, encompassing volume, complexity, and concurrency, is advisable.

4. Maintaining Technical Debt

Another technique I've observed legacy vendors employing in benchmarks involves compelling the customer to utilize the code from their current platform as-is, making only minimal modifications. They are well aware that their code has undergone optimization for their platforms over decades, and it bears a substantial load of technical debt. This strategy creates significant hurdles for other vendors to compete equitably.

This approach is feasible if you want to maintain current affairs and ensure that your code doesn't change, carrying over the technical debt to the new platform. However, if your goal is to modernize the platform and eliminate any lingering technical debt, then it becomes imperative to outline a target architecture. This architecture should be adhered to by all vendors, ensuring a level playing field. After all, as the saying goes, "Nothing changes if nothing changes."

5. The Myth of Industry Standard Benchmarks

To address the challenges mentioned earlier in devising meaningful benchmarks, the TPC Transaction Processing Performance Council introduced specifications for various data-centric benchmarks in 1994. While this appeared promising in theory, it ultimately failed to capture real-world complexities.

Consider the following limitations within these benchmarks:

Uniform Data Distribution: This notion contradicts real-world data, which rarely exhibits perfect uniformity. (When have we encountered real-world data without any skew?)
Fixed Query Count: Enforcing a fixed number of queries resembles knowing the exam questions in advance. (Do you only run a handful of known queries on your system all the time?)
Sequential Execution Bias: This approach disregards the concurrency in real-world scenarios. (What about the intricacies of concurrent operations?)
Limited Data Model: The benchmarks restrict themselves to 24 tables. However, real-world production data platforms often harbor over 100,000 tables across customers and industries.

Due to these factors, TPC benchmarks seem akin to a solved puzzle. The data model has remained stagnant since 1994. Vendors invest years fine-tuning their platforms to excel in TPC-DS benchmarks. Nonetheless, with the constrained number of tables and queries, we fall victim to the small numbers effect. Vendors can engineer specific index structures and optimizations to enhance the performance of TPC queries, masking the fact that real-world queries suffer from dire performance issues.

Recommendations

Wondering what steps to take? Here are some practical tips:

Personalize Your Benchmark: Your ideal benchmark is tailored to your unique data, operating environment, and future objectives.
Unveil Deceptive Techniques: Watch out for the "smoke and mirrors" strategies highlighted earlier. Prioritize genuine data volumes, complexity, and concurrency by simulating real production scenarios.
Set a Target Architecture: Craft a target architecture for your data platforms, then devise tests to assess their compatibility with this future state.
Single Data Copy: Don't allow vendors to manipulate multiple copies. Instead, please ensure all tests run on a single data copy for an authentic evaluation.
Introduce Surprise Queries: Infuse ad-hoc queries during testing to gauge how the system handles unexpected workloads, unveiling its adaptability.
Consider Total Cost of Ownership: Embrace a holistic perspective by incorporating the total cost of ownership in your assessment. Prioritize price/performance over merely throwing hardware at software issues.
Evaluate Environmental Impact: Extend your benchmark criteria to encompass the environmental implications of the technology.
Incorporate Non-Functional Attributes: Include non-functional aspects like ease of use, availability, fault tolerance, and agility in your evaluation matrix.
Test Recovery Scenarios: Put systems through their paces by incorporating failure recovery scenarios. Purposefully induce node and component failures to evaluate graceful recovery.
Anticipate Migration Complexities: Grasp the intricacies of migrating from your current solution to the new platform. This awareness is crucial for seamless transitions.

Are there any other tips you would add to the list?

If you like, please subscribe to the FAQ on Data newsletter and/or follow Fawad Qureshi on LinkedIn.

Sajid Abbas

Sr. Systems Engineer

1 年

Great article ????. I have a question though. In order to personalize a benchmark and keeping single data copy, how practical is it to maintain a single real big test dataset which is varied in its quality? Context: Different vendors need and operate on independent dataset. Also if you have huge data set but it is similar/uniform in values then doesn't matter how many trials you do, your test coverage is limited.

Franco Patano

Strategic Data and AI Advisor

1 年

What about open sourced benchmark code that can be inspected, and independently reproducible? In your view, benchmarks provide any value? What about for consumption planning?

1 次回应

Sterling Technology Consulting

1 年

I think that’s a credit to Mark Twain. So accurate in this case as well. As always, Fawad A. Qureshi , your perspective is right on.

1 次回应

查看更多评论

要查看或添加评论，请登录

Fawad A. Qureshi的更多文章

When AI Meets Ambiguity: How Humans Will Thrive in the Age of Machines

2025年3月17日

When AI Meets Ambiguity: How Humans Will Thrive in the Age of Machines

A few years ago, a conversation with a senior leader stuck with me. He said, “Everything we do requires data to operate.

2 条评论
Leading Companies in the Age of 5G and AI Transformation

2025年3月10日

Leading Companies in the Age of 5G and AI Transformation

At this year's MWC Barcelona, I had a chance to participate in a thought-provoking panel hosted by The Female…
Disrupt or Be Disrupted – The High-Stakes Race to Become AI-Focused

2025年3月7日

Disrupt or Be Disrupted – The High-Stakes Race to Become AI-Focused

At MWC Barcelona, Snowflake hosted an executive roundtable featuring telecom leaders discussing the realities of AI…

3 条评论
From Shared Responsibility to Shared Destiny: The Evolution of Cloud Governance

2025年3月3日

From Shared Responsibility to Shared Destiny: The Evolution of Cloud Governance

When the cloud was initially created, setting up demarcation lines of responsibility was important. This is why…
Return on Hassle: The hidden metric we all ignore

2025年2月24日

Return on Hassle: The hidden metric we all ignore

We often obsess over Return on Investment (ROI); how much we get back for every dollar, minute, or ounce of effort we…

1 条评论
When the Whole is Less Than the Sum of Its Parts

2025年2月17日

When the Whole is Less Than the Sum of Its Parts

We have all heard the adage: "The whole is greater than the sum of its parts." It is a comforting idea, suggesting…

2 条评论
What Data Cannot Tell You?

2025年2月10日

What Data Cannot Tell You?

I had spent years working with data, trusting that the numbers would always tell the whole story. My dashboards were…

3 条评论
The Subtle Power of Choice Architecture: Nudging Without Forcing

2025年2月3日

The Subtle Power of Choice Architecture: Nudging Without Forcing

Every day, we make countless decisions; what to eat, which products to buy, which cookies to accept on a website. But…
The Five Types of Wealth

2025年1月27日

The Five Types of Wealth

When you hear the word “wealth,” what comes to mind? If your first thought is money, you’re not alone. But here’s the…

8 条评论
Create Your Own Luck: The Four Types of Luck

2025年1月20日

Create Your Own Luck: The Four Types of Luck

Last year, I wrote an article on Increasing Your Luck Surface Area, discussing how to give yourself a chance to get…

4 条评论

See all articles

Benchmarks don't lie but liars use benchmarks

Fawad A. Qureshi

Field CTO @ Snowflake | LinkedIn Learning Instructor | Sustainability ??, Data Strategy, Business Transformation

1. Benchmarks are set long before a single line of code is ever written

2. Sponsored Benchmark Studies

3. Law of Large Numbers in Benchmarks

领英推荐

4. Maintaining Technical Debt

5. The Myth of Industry Standard Benchmarks

Recommendations

Fawad A. Qureshi的更多文章

社区洞察

其他会员也浏览了

When Common Knowledge Fails: Using Data to Challenge the “Obvious”

Strategies for Entrepreneurs: Unleashing Data Insights to Outpace Competitors

Spotting Patterns and Trends in Data to Predict Future Market Behaviors

A Selective Analytics Approach: Less Noise, More Value!

From Utopia to Reality: Marketing and the Big Data Revolution

Data-Driven or Fact-Driven Decision Making

Data or digital gold

Data beats Opinions

A Call to Arms for the Future of People Analytics

Analytic Scale Techniques

1. Benchmarks are set long before a single line of code is ever written

2. Sponsored Benchmark Studies

3. Law of Large Numbers in Benchmarks

领英推荐

4. Maintaining Technical Debt

5. The Myth of Industry Standard Benchmarks

Recommendations

Fawad A. Qureshi的更多文章

When AI Meets Ambiguity: How Humans Will Thrive in the Age of Machines

Leading Companies in the Age of 5G and AI Transformation

Disrupt or Be Disrupted – The High-Stakes Race to Become AI-Focused

From Shared Responsibility to Shared Destiny: The Evolution of Cloud Governance

Return on Hassle: The hidden metric we all ignore

When the Whole is Less Than the Sum of Its Parts

What Data Cannot Tell You?

The Subtle Power of Choice Architecture: Nudging Without Forcing

The Five Types of Wealth

Create Your Own Luck: The Four Types of Luck

社区洞察

其他会员也浏览了

When Common Knowledge Fails: Using Data to Challenge the “Obvious”

Strategies for Entrepreneurs: Unleashing Data Insights to Outpace Competitors

Spotting Patterns and Trends in Data to Predict Future Market Behaviors

A Selective Analytics Approach: Less Noise, More Value!

From Utopia to Reality: Marketing and the Big Data Revolution

Data-Driven or Fact-Driven Decision Making

Data or digital gold

Data beats Opinions

A Call to Arms for the Future of People Analytics

Analytic Scale Techniques