What is behind this number?

What is behind this number?

In data-driven conversations, numbers are used to support arguments. Some companies, like Amazon, nurture data-driven cultures, and I experienced this firsthand. As a Solutions Architect dealing with web performance, I had to dive into latency numbers to identify areas for improvement. Later, in a leadership role, I had to make decisions based on numbers (e.g., where to hire the next Solutions Architect), challenge numbers used in stakeholder arguments (e.g., revenue growth figures), and defend opinions using numbers (e.g., the impact of a report during year-end reviews or promotion processes).

Through these experiences, I saw how numbers can help make better decisions, but also how they can be weaponized by individuals or competitors to advance their own agendas. In this article, I will share several tips to be more resilient against weaponized numbers.

Tip 1 - Where is the data?

Let's start with the obvious: do not accept statements not backed by numbers. Challenge claims such as:

  • "We had significant revenue growth this year" -> Ask: "Can you quantify 'significant'? Is it 5%? 10%? 50%? 100%?"
  • "This individual had an outstanding impact through the book they wrote" -> Ask: "Can you quantify this impact? How many books were sold? What projects or revenues were influenced?"

At Amazon, we call such vague adjectives "weasel words," and we encourage individuals to replace them with actual data.

Tip 2 - Where does the data come from?

To audit the quality and legitimacy of presented data, it's crucial to know its source. It's easy to fabricate numbers to sound data-driven in arguments. A data point that is not verifiable is not a reliable data point.

Tip 3 - How is is calculated?

Some data points are straightforward to understand when presented. For example, when you hear "revenue growth of 30% YoY," we all know how it's calculated. However, other data points can be trickier to understand, and the calculation method must be inspected.

For example, if someone claims, "The latency of your competitor's CDN service is 15% lower than yours," try to understand what they mean by "latency":

  • How was latency measured? Was it based on a VP's anecdotal experience loading a web page? Was it simulated using synthetic testing? Or was it measured using Real User Monitoring techniques?
  • How was the number aggregated? Is it an average, median, or 90th percentile? P90 tells a different story from P50. Ideally, you should look at histograms, as a single number cannot easily represent the full reality.
  • What is the aggregation period? For instance, a P99 measurement of latency data points over a day is very different from the average of 24 P99 measurements taken hourly.
  • Was the comparison done under the same conditions? It's not valid to compare the latency of two systems at different times or under different loads.

For the anecdote, the lesson about the importance of histograms in representing performance data was driven home by Jim Roskind , the inventor of the QUIC protocol, which today carries the modern and performant HTTP/3. At a specialist meeting in Seattle some years ago, Jim was invited to explain QUIC to us. To our surprise, he spent the majority of the time explaining how he set up latency data collection on Chrome browsers to build latency histograms. In the last 20 minutes, he explained how he used these latency histograms to actually develop the QUIC protocol and iterate on its performance. This approach underscores the critical importance of comprehensive data representation in performance analysis.

Tip 4 - Can you break it down?

Sometimes, you need to break down the provided number to better understand it. For example:

  • If you're told a sales person achieved 110% of their quota, try to break it down by product line, customer segment, or territory. They might have had an unexpected 300% growth in one segment but only 20% attainment in others.
  • For performance metrics, a CDN service might be doing extremely well in one country but below average in others.

Tip 5 - How does it compare?

Finally, consider how the number compares to relevant benchmarks:

  • A sales professional achieving 30% revenue growth might seem impressive, but how does it compare with other sellers? The market might actually allow for 100% growth.
  • How does it compare to last year's performance? If it was 50% last year, it's actually slowing down.
  • How does it compare to the revenue base? Achieving 30% growth on a $100K annual revenue base is much easier than 30% growth on a $10M base.

Recommended Reading

I recommend the book "Factfulness" by Swedish physician and statistician Hans Rosling (2018). While I don't entirely agree with its premise that the world is in a much better state than most people believe, the book provides valuable insights into the instincts that distort our perspective on the world and offers tools for more fact-based thinking.

I learned about this book from Arthur Petitpierre , a friend and a Principal SA at AWS, during ReInvent last year. Arthur has spent years in his career diving into latency numbers within his specialization in High Performance Computing ^^

By applying these tips and maintaining a critical mindset, you can become more resilient to potential misuse of data in decision-making processes.


Jim Roskind

Vice President and Distinguished Engineer at Amazon.com

7 个月

Very nice posting! It is great to see you and others push so methodically for measurements, with deep dives into the metrics. You mentioned a talk I gave about the evolution of the QUIC protocol, which grew into HTTP/3. Here is a link to the 2016 pre-Amazon talk about QUIC and other real-world measurements, which was the basis of that Amazon talk: https://www.facebook.com/watch/?v=1695131504093280

要查看或添加评论,请登录

Achraf Souk的更多文章

社区洞察

其他会员也浏览了