登录查看更多内容

You have more data than you think

Fernando Cuenca

I help teams and organizations to better deliver services, projects and digital products, using Kanban and other modern management practices / AKC, AKT, KCP, KMP, CSM, PSMII

发布日期: 2024年10月11日

+ 关注

"I don't have enough data to make any meaningful analysis... I need to wait until I collect more..."

Actually, you need less data than you think:

?? 5 data points are enough to know the order of magnitude of the distribution's SCALE (are we talking about days? months? weeks? years?)

?? 12 data points: take the central 6 data points, they determine the "range of the median" (the "typical case", "this is how long things usually take")

?? 30 data points: things get more interesting:

- take the lowest 6 data points: range of the "best case" (10th percentile, "this is how fast we can be")

- take the highest 6 data points: range of the "worst case" (90th percentile, "this is how bad it can get")

- take the central 10 data points: range of the "typical case" (median, or 50th percentile)

领英推荐

How to free your organisation's data - for the win

Merkle Aotearoa 1 年前

Data Cubed 1 年前

Overwhelmed by Data? Here’s How to Turn It Into…

B2BinDemand 1 个月前

In all these cases you can compare?the ranges you get from the data to the expectation of your customer/stakeholders, and use that as a guide to stimulate improvement.

To end with an "alexeism":

"An improved service is better than a more precise model of an unsatisfactory service" ?? -- Alexei Zheglov

Some additional clarifications

A clarification on the meaning of the ranges you find with this technique: they refer to a high confidence range (90%) for the location of the given percentile. So, for example, the 6 central data points in a dataset of 12 samples gives us a range where, with 90% confidence, we can expect to find the median.

In the example diagram above, for 12 data points the range would be 3 to 9, meaning: I can say with 90% confidence that the median for that distribution will be located there. Of course, I can't say how close to 3 or how close to 9, and there's a small chance it will fall outside the range.

The point here is not high degree of accuracy, but to show that a few data points are enough to have some informed starting point for a conversation. For example, if someone claims that the "work here takes months", the 5 data points in the example above are enough for me to respond that it's likely not the case, that we should be discussing "weeks" and not "months".

Delivery Manager Fieldnotes

556 位关注者

Christopher R. Chapman

4 个月

Sort-of: What's the rational basis for prediction of the process into the future? An aggregate of more than a dozen data points? Consider: Is the underlying process exhibiting a predictable range of variation? What is it? Below is real-world WIP data from a team I coached in early 2020. On the left, the first 9 data points I gathered working with the team. Using those data points, what would enumerative statistics (which answer questions of "How many? How much?") tell us about the future performance of the system? Consider the chart on the right showing the next five days. What would enumerative statistical techniques tell us beyond adding to the counts? In the PBC, you can see this outside the norm of the system so far, even when considering the last five days. Something has occurred and it's not good. Where would you think the data goes next? Deming's observation here is that we need to be careful to remember that empirical evidence is never complete. Ergo, don't use enumerative techniques to predict the performance of a system, use analytical techniques, like a PBC.

1 次回应

Alexei Zheglov

Co-founder and Principal Consultant at SquirrelNorth

4 个月

OK, let's draw one sample of 30 data points. How many of them are greater and less than the 90th percentile? Well I got 1 greater and 29 less this time Next sample: 27. Next: 26. Let's try a few more. 30? Yes, this is a small sample so it's possible for something with only 10% probability of occurring not to occur in it. 23??? Not likely, but possible. OK, let's automate and repeat this experiment say 10,000 times. Relying heavily on Excel function COUNTIF and copy-paste. Result in the chart below - the ends of the 90% confidence interval are highlighted. It covers 6 gaps between elements 24 and 30. Can this be done for any N data points, percentile P and confidence interval Q, with a formula? Yes.

Alexei Zheglov

Co-founder and Principal Consultant at SquirrelNorth

4 个月

Estimating the median from a sample of 5 observations is a well-known probability trick. The arithmetic is the same as in a common card game problem: your 2 opponents hold 5 cards of the same suit, what's the chance each of them holds at least 1? One observation is 50/50 to land on either side of the median. All five, unlikely, but not a negligible probability of 2^-5=1/32. The same on the other side, leaving 15/16=94% in the middle. Estimating e.g. the 90th percentile from a sample of 30 in the next comment

Aldwin Neekon

Google Ads Specialist helping fCMOs generate revenue for their clients

4 个月

I need to send this to a skeptic. How can they double-check the math which at this point goes against their intuition?

2 次回应

查看更多评论

要查看或添加评论，请登录

Fernando Cuenca的更多文章

Probabilistic Forecasting: Here be dragons!

2024年12月17日

Probabilistic Forecasting: Here be dragons!

If you look at pictures of medieval maps, you'll notice on some of them prominent signs reading "Here be dragons!" to…

6 条评论
Doing more with the same: Ideas from the Kanban toolbox

2024年12月10日

Doing more with the same: Ideas from the Kanban toolbox

In a recent conversation with a colleague, it was mentioned that the mandate from their organization's leadership was…
Three Purposes for a Lead Time Distribution Chart

2024年12月3日

Three Purposes for a Lead Time Distribution Chart

Let's look at a Lead Time Distribution chart: Each X represents a delivered item, and its position respect to the…

2 条评论
The owls are not what they seem: analyzing multi-modal distributions

2024年11月19日

The owls are not what they seem: analyzing multi-modal distributions

When we start collecting Lead Time data and visualizing it in a histogram, we'd love to see a smooth shape that we can…

6 条评论
Finding the Improvement Gap

2024年11月14日

Finding the Improvement Gap

"OK, I got my Lead Time data in a histogram, for a relevant period, and I'm not throwing away any data points. What am…

11 条评论
Defining SLAs: The 85th Percentile Safety Blanket

2024年11月12日

Defining SLAs: The 85th Percentile Safety Blanket

"OK, I got my Lead Time data in a histogram. I heard that then I use the 85th percentile to determine my SLA.
Kanban Board Columns: it's about the work, not the workers

2024年11月7日

Kanban Board Columns: it's about the work, not the workers

Worth repeating: a column on your Kanban board is not "the column where [devs|testers|analysts|..

2 条评论
Avoiding the "watermelon" project status report

2024年11月5日

Avoiding the "watermelon" project status report

?? Does your current role involve keeping an eye on projects delivered by one or more Agile Teams, but you're finding…
Outliers: Can I just ignore them?

2024年10月31日

Outliers: Can I just ignore them?

"My Lead Time distribution shows some clear outliers. I can ignore them, right? they skew my data.

3 条评论
Identifying "periods of interest" using "Reference Sets"

2024年10月29日

Identifying "periods of interest" using "Reference Sets"

"I found some historical lead time data. Should I just keep the most recent? last month or quarter, perhaps?" ??…

4 条评论

See all articles

You have more data than you think

Fernando Cuenca

I help teams and organizations to better deliver services, projects and digital products, using Kanban and other modern management practices / AKC, AKT, KCP, KMP, CSM, PSMII

领英推荐

Some additional clarifications

Delivery Manager Fieldnotes

556 位关注者

Fernando Cuenca的更多文章

社区洞察

其他会员也浏览了

Why are we still talking about Data? A guide to fixing the biggest problem in your commission system.

Powered by DATA3 Issue 15 | July 2023

You can't report on everything

Data Update 1 for 2023: Setting the Table

OFFSET and its usage with Calculation Groups

?? Beyond the Numbers: Transforming Data Into Business Superpowers

Techniques for Inviting the Board to Interact with Data

How close do you need to be to your data?

How to Handle Missing Data in Your KPIs

Measures of Variability

领英推荐

Some additional clarifications

Delivery Manager Fieldnotes

556 位关注者

Fernando Cuenca的更多文章

Probabilistic Forecasting: Here be dragons!

Doing more with the same: Ideas from the Kanban toolbox

Three Purposes for a Lead Time Distribution Chart

The owls are not what they seem: analyzing multi-modal distributions

Finding the Improvement Gap

Defining SLAs: The 85th Percentile Safety Blanket

Kanban Board Columns: it's about the work, not the workers

Avoiding the "watermelon" project status report

Outliers: Can I just ignore them?

Identifying "periods of interest" using "Reference Sets"

社区洞察

其他会员也浏览了

Why are we still talking about Data? A guide to fixing the biggest problem in your commission system.

Powered by DATA3 Issue 15 | July 2023

You can't report on everything

Data Update 1 for 2023: Setting the Table

OFFSET and its usage with Calculation Groups

?? Beyond the Numbers: Transforming Data Into Business Superpowers

Techniques for Inviting the Board to Interact with Data

How close do you need to be to your data?

How to Handle Missing Data in Your KPIs

Measures of Variability