You have more data than you think
Fernando Cuenca
I help teams and organizations to better deliver services, projects and digital products, using Kanban and other modern management practices / AKC, AKT, KCP, KMP, CSM, PSMII
"I don't have enough data to make any meaningful analysis... I need to wait until I collect more..."
Actually, you need less data than you think:
?? 5 data points are enough to know the order of magnitude of the distribution's SCALE (are we talking about days? months? weeks? years?)
?? 12 data points: take the central 6 data points, they determine the "range of the median" (the "typical case", "this is how long things usually take")
?? 30 data points: things get more interesting:
- take the lowest 6 data points: range of the "best case" (10th percentile, "this is how fast we can be")
- take the highest 6 data points: range of the "worst case" (90th percentile, "this is how bad it can get")
- take the central 10 data points: range of the "typical case" (median, or 50th percentile)
领英推荐
In all these cases you can compare?the ranges you get from the data to the expectation of your customer/stakeholders, and use that as a guide to stimulate improvement.
To end with an "alexeism":
"An improved service is better than a more precise model of an unsatisfactory service" ?? -- Alexei Zheglov
Some additional clarifications
A clarification on the meaning of the ranges you find with this technique: they refer to a high confidence range (90%) for the location of the given percentile. So, for example, the 6 central data points in a dataset of 12 samples gives us a range where, with 90% confidence, we can expect to find the median.
In the example diagram above, for 12 data points the range would be 3 to 9, meaning: I can say with 90% confidence that the median for that distribution will be located there. Of course, I can't say how close to 3 or how close to 9, and there's a small chance it will fall outside the range.
The point here is not high degree of accuracy, but to show that a few data points are enough to have some informed starting point for a conversation. For example, if someone claims that the "work here takes months", the 5 data points in the example above are enough for me to respond that it's likely not the case, that we should be discussing "weeks" and not "months".
Sort-of: What's the rational basis for prediction of the process into the future? An aggregate of more than a dozen data points? Consider: Is the underlying process exhibiting a predictable range of variation? What is it? Below is real-world WIP data from a team I coached in early 2020. On the left, the first 9 data points I gathered working with the team. Using those data points, what would enumerative statistics (which answer questions of "How many? How much?") tell us about the future performance of the system? Consider the chart on the right showing the next five days. What would enumerative statistical techniques tell us beyond adding to the counts? In the PBC, you can see this outside the norm of the system so far, even when considering the last five days. Something has occurred and it's not good. Where would you think the data goes next? Deming's observation here is that we need to be careful to remember that empirical evidence is never complete. Ergo, don't use enumerative techniques to predict the performance of a system, use analytical techniques, like a PBC.
Co-founder and Principal Consultant at SquirrelNorth
4 个月OK, let's draw one sample of 30 data points. How many of them are greater and less than the 90th percentile? Well I got 1 greater and 29 less this time Next sample: 27. Next: 26. Let's try a few more. 30? Yes, this is a small sample so it's possible for something with only 10% probability of occurring not to occur in it. 23??? Not likely, but possible. OK, let's automate and repeat this experiment say 10,000 times. Relying heavily on Excel function COUNTIF and copy-paste. Result in the chart below - the ends of the 90% confidence interval are highlighted. It covers 6 gaps between elements 24 and 30. Can this be done for any N data points, percentile P and confidence interval Q, with a formula? Yes.
Co-founder and Principal Consultant at SquirrelNorth
4 个月Estimating the median from a sample of 5 observations is a well-known probability trick. The arithmetic is the same as in a common card game problem: your 2 opponents hold 5 cards of the same suit, what's the chance each of them holds at least 1? One observation is 50/50 to land on either side of the median. All five, unlikely, but not a negligible probability of 2^-5=1/32. The same on the other side, leaving 15/16=94% in the middle. Estimating e.g. the 90th percentile from a sample of 30 in the next comment
Google Ads Specialist helping fCMOs generate revenue for their clients
4 个月I need to send this to a skeptic. How can they double-check the math which at this point goes against their intuition?