Data-driven forecasting without subjective estimates using the Monte Carlo Method
Estimations are an essential and unavoidable (like taxes) part of any project. Avoid them at your own peril! You've been warned.
What's at stake anyway?
Estimating any work item (WI) is part of the planning process and it beneficial for several reasons:
Also, planning (and estimations) benefit from being done throughout all project phases, essentially due to the nature of the user needs and expectations (uncertain as they come) and also the complexity and ambiguity embedded in the software development world (despite all the progress in methodologies, tools, philosophies and approaches). Why? Because it deals with people (ok, processes and technologies as well, although these are far more amenable to stabilization).
This is something any project manager (or other delivery stakeholders) need to understand and, most importantly, accept. Only then we can start to continuously find ways to manage and walk through that complexity,?uncertainty and ambiguity (the famous CUA acronym from Andy Grove's great insights from it's High Output Management book) to steer the projects towards success - or as Churchill would put it:
Success is not final, failure is not fatal: it is the courage to continue that counts.
So...
Given that challenge to provide estimates within our projects, I'm a major advocate and enthusiast of the Flow metrics (data-driven as a principle, of course), which are (according to Flow Metrics for Scrum Teams book by Will Seele and Daniel Vacanti ):
WIP: The number of work items (WIs) started but not finished.
Cycle Time: The amount of elapsed time between when a work item started and when a work item finished.
Work Item Age: The amount of elapsed time between when a work item started and the current time.
Throughput: The number of WIs finished per unit of time. (Note: the measurement of throughput is the exact count of WIs).
And we arrive at the core concept of Flow (according to Seele & Vacanti as above):
The movement of potential value through a given process
to, essentially, deliver to your customer - period. (you can read in more detail the aforementioned book to better understand the full picture of it, what it represents to Kanban and including its application to Scrum teams)
Then...
Let me arrive at (my) intended destination by focusing, on this article, on a specific tool that can be used for forecasting several WIs (and the great benefit is that it can be used in both a Kanban methodology approach as well as for Scrum teams):
The Monte Carlo Method (and MCS specifically) is a statistical sampling technique where a simulation is used to predict the probability of different outcomes given the input of random variables.
That is, you can:
So shows us the tool, already!
Here it is: Monte Carlo Simulation Tool (*)
In essence:
领英推荐
So, if you want to take a quick look on what it can provide, go straight to here and you can see:
A throughput per week chart based on the last X days (that you can define as part of the pruning process to get the most suited data for proper and more accurate results from the simulation):
This is the historical data from your team's performance as measured by the number of items finished. As the rule of thumb goes, the quality of your past data will determine the quality of your forecast data (or simply put as 'gargabe in, garbage out' as the famous data science adage) - so make sure you take your time to best select the period that is representative and has the proper conditions to be considered:
A histogram for the distribution of the results generated by the tool based on potential delivery dates
This chart is the output of the simulation of randomly selected inputs (the throughput rate of your team for the past X amount of days) and it plots the potential completion dates by frequency.
A histogram with percentiles (70th, 85th and 95th) with the corresponding completion date and it's assigned probability
This chart is a reframing of the previous chart where we now get the data in ascending order, meaning that we have the potential completion dates and their corresponding cumulative probability (going from 0% as least likely to 100% most likely) along with the percentiles mentioned.
The 95th percentile corresponds to the most likely in terms of date, however there is a caveat here to not confuse this as a certainty. As previously mentioned, there are a lot of factors impacting the team performance and even with the most care in selecting the past performance data, we're still dealing with uncertainty and this tool is an exercise in simulating potential outcomes and plotting their probabilities.
So please don't announce to your stakeholders:
The project will be finished by 2019-05-28.
But rather:
There is a high probability (95%) of the project finishing no earlier than 2019-05-28.
It might seem like semantics but it's not - the first sentence announces a completion date as a certainty (not allowing for newer information to come in and update the estimates, and thus the completion date) whereas the second sentence both communicates a completion date with high chance of being ready but also the uncertainty around it (not earlier than as an emphasis and also that in fact in can happen before but there is a probability associated with it).
If you've enjoy reading this article, please consider:
(*) this tool was developed by Jacob Bo Tiedemann (which also wrote this amazing article called "You must be this tall to use Agile metrics") and made available in this repo. I have forked the repo and made a few changes/fixes, mainly:
Hi Bernardo, thanks for the mention and happy to see you enjoyed the book! I really enjoyed your article. I do feel a bit uncomfortable with the final graphic, the completion probability histogram. A novice might read that and conclude 'Oh, so there's a 100% chance of delivery May 27th', which is of course not the intent.