Data-driven forecasting without subjective estimates using the Monte Carlo Method
Photo by Rishi Jhajharia on Unsplash

Data-driven forecasting without subjective estimates using the Monte Carlo Method

Estimations are an essential and unavoidable (like taxes) part of any project. Avoid them at your own peril! You've been warned.


What's at stake anyway?

Estimating any work item (WI) is part of the planning process and it beneficial for several reasons:

  1. it gathers the team to better understand (through critical thinking and imagination) what they're trying to build and deliver to customers;
  2. it allows the team to have a proper sense on priorities and start thinking on opportunity costs, such as tech debt (did I mention taxes already?) and be able to focus on the customers and their needs;
  3. also supports the organization in having a better view on resource needs and better allocation of funds to support the most viable experiments that will yield a higher customer value and also where it can remove waste;
  4. and yes, it will help manage better the expectations of all the stakeholders invested in getting those great features to their beloved users.


Also, planning (and estimations) benefit from being done throughout all project phases, essentially due to the nature of the user needs and expectations (uncertain as they come) and also the complexity and ambiguity embedded in the software development world (despite all the progress in methodologies, tools, philosophies and approaches). Why? Because it deals with people (ok, processes and technologies as well, although these are far more amenable to stabilization).

This is something any project manager (or other delivery stakeholders) need to understand and, most importantly, accept. Only then we can start to continuously find ways to manage and walk through that complexity,?uncertainty and ambiguity (the famous CUA acronym from Andy Grove's great insights from it's High Output Management book) to steer the projects towards success - or as Churchill would put it:

Success is not final, failure is not fatal: it is the courage to continue that counts.


So...

Given that challenge to provide estimates within our projects, I'm a major advocate and enthusiast of the Flow metrics (data-driven as a principle, of course), which are (according to Flow Metrics for Scrum Teams book by Will Seele and Daniel Vacanti ):

WIP: The number of work items (WIs) started but not finished.
Cycle Time: The amount of elapsed time between when a work item started and when a work item finished.
Work Item Age: The amount of elapsed time between when a work item started and the current time.
Throughput: The number of WIs finished per unit of time. (Note: the measurement of throughput is the exact count of WIs).

And we arrive at the core concept of Flow (according to Seele & Vacanti as above):

The movement of potential value through a given process

to, essentially, deliver to your customer - period. (you can read in more detail the aforementioned book to better understand the full picture of it, what it represents to Kanban and including its application to Scrum teams)


Then...

Let me arrive at (my) intended destination by focusing, on this article, on a specific tool that can be used for forecasting several WIs (and the great benefit is that it can be used in both a Kanban methodology approach as well as for Scrum teams):

The Monte Carlo Method (and MCS specifically) is a statistical sampling technique where a simulation is used to predict the probability of different outcomes given the input of random variables.

That is, you can:

  • forecast a completion date (the result in a form of a probability - p - where: p < 1);
  • based on the historical (pruned) data for your team's throughput rate (the random variables as it builds a sample by randomly selecting from that historical data);
  • through a simulation for an X number of iterations (10K or more, where increasing the number of iterations can improve the accuracy of the results by reducing the statistical uncertainty since it explores a larger portion of the possible outcomes thus providing a more comprehensive understanding of the probability distribution)


So shows us the tool, already!

Here it is: Monte Carlo Simulation Tool (*)


In essence:

  1. go to your tool of choice for work management (such as Jira);
  2. select the project and scope of tickets that you want to forecast (this could be a Board, set of Boards, or even a query that collects tickets from different filters);
  3. export them to a CSV file;
  4. feed them to the tool for input and run it!


So, if you want to take a quick look on what it can provide, go straight to here and you can see:

A throughput per week chart based on the last X days (that you can define as part of the pruning process to get the most suited data for proper and more accurate results from the simulation):

This is the historical data from your team's performance as measured by the number of items finished. As the rule of thumb goes, the quality of your past data will determine the quality of your forecast data (or simply put as 'gargabe in, garbage out' as the famous data science adage) - so make sure you take your time to best select the period that is representative and has the proper conditions to be considered:

  • one common scenario to watch out for is if a team is newly formed then the data will not be representative of a more controlled/stable process so it's not a good idea to consider it;
  • another common scenario is, similar to the one before, if a new team member joins the team, performance is more likely to be affected;
  • yet another is for longer absence periods or business seasonality (despite a strong argument to use this metric is that all other factors should be embedded in the data, naturally for longer periods where this can disrupt the performance should not be selected, such as summer holidays, etc.)

No alt text provided for this image


A histogram for the distribution of the results generated by the tool based on potential delivery dates

This chart is the output of the simulation of randomly selected inputs (the throughput rate of your team for the past X amount of days) and it plots the potential completion dates by frequency.

No alt text provided for this image


A histogram with percentiles (70th, 85th and 95th) with the corresponding completion date and it's assigned probability

This chart is a reframing of the previous chart where we now get the data in ascending order, meaning that we have the potential completion dates and their corresponding cumulative probability (going from 0% as least likely to 100% most likely) along with the percentiles mentioned.

The 95th percentile corresponds to the most likely in terms of date, however there is a caveat here to not confuse this as a certainty. As previously mentioned, there are a lot of factors impacting the team performance and even with the most care in selecting the past performance data, we're still dealing with uncertainty and this tool is an exercise in simulating potential outcomes and plotting their probabilities.

So please don't announce to your stakeholders:

The project will be finished by 2019-05-28.

But rather:

There is a high probability (95%) of the project finishing no earlier than 2019-05-28.

It might seem like semantics but it's not - the first sentence announces a completion date as a certainty (not allowing for newer information to come in and update the estimates, and thus the completion date) whereas the second sentence both communicates a completion date with high chance of being ready but also the uncertainty around it (not earlier than as an emphasis and also that in fact in can happen before but there is a probability associated with it).


No alt text provided for this image



If you've enjoy reading this article, please consider:


(*) this tool was developed by Jacob Bo Tiedemann (which also wrote this amazing article called "You must be this tall to use Agile metrics") and made available in this repo. I have forked the repo and made a few changes/fixes, mainly:

  • Fixed the?Distribution of Monte Carlo Simulation 'When'?chart to properly pass the correct param in the simulation method
  • Fixed the?Probabilities of Completion Dates?chart to properly calculate the percentiles from the distribution dataframe as process after the sampling from the MCS

Hi Bernardo, thanks for the mention and happy to see you enjoyed the book! I really enjoyed your article. I do feel a bit uncomfortable with the final graphic, the completion probability histogram. A novice might read that and conclude 'Oh, so there's a 100% chance of delivery May 27th', which is of course not the intent.

要查看或添加评论,请登录

Bernardo Marques的更多文章

社区洞察

其他会员也浏览了