Tool: Spreadsheet for Project Duration Estimation (and Cake!)
Adam Salvail, MSc
ML Engineering Manager | AI Innovation | Structured Decision-Making | Developing Leaders | AI Roadmap Development | Expectations Communication | Scientist Performance Management | Career Growth
Last week, I presented a process for doing project prioritization. Part of the process requires the estimation of the duration of the project, as the labour is likely the main fixed cost of a software development project. However, getting an accurate estimate of a project’s duration can be useful for more than just prioritization, for example, planning and coordination with other teams. This week, I’ll share a method I’ve used to produce those predictions. From my experience, using this method reduces the number of late projects significantly. I attribute the success of the method to three factors: clarity, communication of uncertainty, and decomposition.
Let’s use this example throughout the article. Imagine that you want to bake a carrot cake. How long is it going to take before you can serve someone a slice of cake? Actually take 20 seconds to think about it and write down your prediction.
Got it?
I’ll play along with you. I’ve baked a few cakes in the past, so I estimate it’ll take about two and a half hours to make the cake. That same level of effort used to produce the estimate is often all people do. Let’s see how we can refine that.
Clarify
Funny question: what’s a carrot cake? The above estimate is completely meaningless to you unless you have the same understanding of what’s a carrot cake as I do. Is it one of Claire Saffitz layered brown butter carrot cake or the simpler square carrot cake and whipped cream from Chef Jean-Pierre? Or maybe you’re already laughing because you planned on baking a boxed carrot cake and buying the shelf stable cream cheese frosting they sell next to it. All three would require a significantly different time investment.
??Make sure people have a shared understanding of the expectations regarding a project. No serious estimates should be made before this is the case.
For software projects, I find that clarifying the deliverables is a big help in making sure everyone agrees on the objective. If the project is very complex, then the first phase should be to document the work to be done in greater detail. If nothing else, this will help evaluate whether the solution is even feasible.
For my carrot cake, I’ll go with Chef Jean-Pierre’s recipe. I never tried it and it seems like a much simpler version than Claire’s (which I did make in the past; it’s delicious!) Figuring that it is a simpler recipe, I’ll reduce my estimate to two hours.
Take the time to clarify what you had in mind when thinking about a carrot cake. Please write your answer in the comments; I need more carrot cake recipes! ??
Communication of Uncertainty
When I write “two hours”, it’s implied that I don’t actually expect to be able to bake a cake in exactly 7200 seconds. The problem is that this fuzziness is ambiguous unless efforts have been made to clearly define my uncertainty. This is even worse when multiple people contribute estimates with varying degrees of fuzziness. Should you communicate the median of your duration estimates? Or maybe share a surer bet, something closer to the 90th percentile to avoid the pain of dealing with a project that is running late? If you’re a leader organizing project planning, this is very important to communicate to the people doing the predictions.
However, successfully communicating uncertainty isn’t easy. There needs to be:
As a side note, I firmly believe that managers have a duty to protect the estimates provided by the team they support, especially if they seem high. Ask why the estimates are high rather than ask them to revise them. Estimates are a communication tool too!
As I wrote a few weeks ago, a good way to communicate uncertainty is to give a lower and upper bound on the duration. For example, when using the 10th and 90th percentiles, denoted p10 and p90 respectively, we say that, 8 times out of 10, the duration should fall between the p10 and the p90 bounds. The wider the interval, the more uncertainty people feel there is in the duration.
Specifically for durations, I like to characterize the shape of the uncertainty using a Weibull distribution. It is defined only for positive values, is skewed towards shorter times, but allows for a longer right tail, which represents the chance of a project running longer than anticipated. From the p10 and p90 estimates, you can find the parameters of the distribution, which, in turn, allows you to derive all the other statistics.
Obviously, my cake won’t take exactly two hours. In fact I think it’ll take between one hour if everything goes well and five hours if I have a big issue. Maybe my two hours estimate was a little on the optimistic side!
Decomposition
Even with a clear picture of the end goal, the way to get there can still be fuzzy. The best way to clear that fuzziness is to start decomposing the problem into smaller components, forcing ourselves to clarify the steps needed to produce the required deliverables.
Specifically, break down the steps into milestones. Describe each milestone in enough detail that someone else could pick up that task and execute it. If a given milestone is too complex, break it down further until you can easily express the work to be done in a few sentences.
For each milestone, I also recommend clarifying the definition of “done”. What needs to be true for someone to be completely done with the milestone?
For example, documentation and testing often cause delays (or worse, are skipped) because it wasn’t accounted for during the planning phase. Decomposition can help catch those instances.
Finally, instead of providing p10 and p90 estimates for the whole project, do so at the milestone level. Since they’re more granular, you are likely to have an easier time coming up with accurate estimates.
领英推荐
For my carrot cake, here is my decomposition into milestones. I looked at the recipe while doing the decomposition. I definitely got a few surprises!
Already, I can see that my initial two hours estimate was totally off! Decomposition (and reading the recipe) clarified what each step entailed and led to a more accurate picture of how long baking this cake is going to take! (And this wasn’t fake. Now I’m resigned to start at least a day in advance before being able to serve a nice slice of carrot cake.)
To get the duration of the whole project, resist the temptation to simply sum each percentile. The chance that everything goes almost perfectly (i.e., the duration is smaller than the p10 estimate) is much lower than the 1 in 10 chance communicated by the p10 total duration. It’s like saying that you’d have a 50% chance of getting five tails in a row when flipping a coin because the chance of getting a tail on each coin flip is 50%. Instead, you need to run a Monte Carlo simulation.
? Monte Carlo Simulation
Used to combine values produced by probability distributions. The simulated combined values are then used to generate statistics like the percentiles and the moments (mean and variance) of the resulting distribution. It is an alternative to deriving a theoretical formula (which is often impossible).
All you need to run this simulation is the following (although you’ll need some math background or rely on pre-built tools as I gloss over a lot of details):
The process goes like this:
I am providing my readers with a free template spreadsheet (.xslx) that showcases how such a spreadsheet can be created. Feel free to adapt it to your needs!
Applying this process to the carrot cake example, I get the following percentiles:
How to Use Simulated Statistics
Unless you have a strong background in mathematics, you are probably unfamiliar with having a function that can produce a slew of statistics rather than a single value. It can feel unwieldy. What does it get you?
If you have to keep a single value, I suggest you look at the 80th or 90th percentile for making plans requiring an estimate of the duration of the project. This way, you are less likely to suffer from late projects if you use a higher percentile rather than the mean or the median. If you finish early, I’m sure you can find something to do with the extra time.
The real value is when you use those numbers in the Decision Loops process to make structured decisions. When comparing projects, you might see that some projects have more uncertainty associated with their execution (i.e., a larger interval between their p10 and p90 values). This can be the source of useful conversations with the project team or indicate that you may want to use risk mitigating strategies to handle that uncertainty.
Comparing the distributions (e.g., by plotting the p10, p25, p50, p75, and p90) of multiple projects together may reveal that certain projects are much safer bets than others, even if their mean durations are similar. This tool will help you make more informed decisions.
As you may have seen from the link to the spreadsheet, I’ve finally set up my website. I’ll be posting my newsletter both here and there every week. With a bit more work on my part, it should be able to display mathematical formulas in a sane manner.
If you are interested in the derivation of the parameter and inverse cumulative distribution for the Weibull, let me know in the comments, and I’ll make an appendix to this post.
Have a wonderful week!