The Conundrum of Estimating Story Points
Published 25 Aug 2022
For many noteworthy reasons, I have grown very wary of estimating work items with points. In this article, I simply wish to present some head-scratching true stories along with some alternative ideas.
Why must we estimate at all?
The original purpose of estimation in traditional projects was simply to forecast delivery milestones. Project managers solicit LOE (Level of Effort) in hours from the project team (developers, quality assurance, etc) in order to forecast the project duration and then provide this data to the PMO Gate Committee. This rarely proves even remotely accurate. It is also a very frustrating activity for the project team, who are afraid of being held accountable to such numbers, so they often pad the numbers.
In Scrum, the purpose is to help teams improve their ability to commit to finishing an increment within a time boxed iteration. An additional benefit is that the exercise of estimation itself helps elicit the discussions about the true scope of work (i.e., “What are we really talking about here?”)
Relative Estimation: The Fibonacci Sequence
“Fibonacci numbers appear unexpectedly often in mathematics. They also appear?in biological settings, such as branching in trees,?the arrangement of leaves on a stem, the fruit sprouts of a?pineapple, the flowering of an?artichoke, an uncurling?fern, and the arrangement of a?pine cone's bracts.” [i]
Using the modified version of the Fibonacci sequence, team members can estimate the size of a work item (typically a user story) in relative comparison to other work items. The key term here is relative size comparison of work. This is often where the confusion sets in, as our old friend, the industrial-era-project-management-mind-set cannot refrain from the lure to standardize everything and achieve the Holy Grail of project management ... perfect predictability!
Personally, I find the phenomenon of the Fibonacci sequence in nature intriguing. Furthermore, the fact that a modified version of it was incorporated into an estimation technique in Scrum was, at the time, quite innovative. However, as I put forward in this article, the potential for misuse of the data, plus the blind adherence?to the validity of the numbers, has increasingly put the focus on the customer and the flow of value at risk.
The Potential for Misuse of Data
Just to be blatently honest (foregoing all subtlety here), based on more than 12 years of experience, I declare that too many executives and managers simply do not understand story points. And try as we may to maintain the resulting data for its intended purpose, they find ways of obtaining and misusing the data. What follows are two very true examples.?
True Story 1: Agile Pilot Team Alpha crushes Agile Pilot Team Beta
During the beginning of an “agile transformation” I was asked to be the Product Owner of our very first Agile Pilot team. The Agile Champions hired a coaching consultancy firm to teach our team “the ways of Agile”. After we had created our backlog (which was a painstaking endeavor) our coach had us pick a random user story from our backlog. We did. Then he told us to assign it 8 points. We did. The problem, as we would soon discover, was that this randomly picked user story ended up being a very simple user need, something like, “As a user, I need to have a hyperlink from the home page to the third-party search portal, so that I can easily navigate there.”
He then instructed us to estimate all other user stories in the backlog with a value of points relative to the complexity and effort of this first hyperlink user story. So over the course of the next few hours, we estimated all user stories on the backlog, most being higher in complexity and effort.?
“Random activities often lead to random chaos, just not necessarily immediately thereafter”. - Me
As you may have guessed, Agile Pilot Team Alpha had no user stories with fewer than 8 points (no 1s, no 2s, no 3s, and no 5s). The majority of our user stories were estimated at 13 or 20 points. We even had a couple of 40s. Fast forward in time, by our fifth sprint, we were rocking 70 to 90 story points per sprint. Our average velocity, which was relative to our reference point of course, was around 78 points. But the high number is trivial without a reference point and in our situation the reference point for an “easy” user story was 8 points. So it should not matter, right? Something about working software being the primary measure of progress, as we were told.?
Based on our success, another Agile Pilot Team (Team Beta) was “launched” (much to the delight of the coaching consulting firm, who so loved using the rocket metaphors). However, this team did not make the same choice of picking an easy user story as their first reference point. Their backlog, contrary to ours, had plenty of user stories that were estimated in the range of 1 to 5 points. In no time, the highly motivated Team Beta was also rocking the value delivery. However, their average velocity was much lower than Team Alpha; it was less than half of Alpha Team’s average velocity. In fact, Team Beta finished roughly 30 points per sprint. But as stated, it’s all relative. I mean, one does not need to be an Einstein to understand the theory of relativity in this context.
Well, somehow, in some form, a manager got a hold of this velocity information. (I think a consultant made an ill-informed decision to include the data on a couple of PowerPoint graphs during a status report). The question was raised by this manager, why is Team Alpha so much more productive than Team Beta? The consultants, correctly so, explained the rationale of relative estimations and why we should not compare the two teams with each other; rather, the slides was merely included to show, within the context of each team, how consistently they were performing, which is in accordance with the Agile Manifesto principle, "Agile processes promote sustainable development…".
But paid consultants hold steadfast to their principles only as is prudent before they become anxious about their gig. This manager insisted on finding out why one team was so demonstrably outperforming the other. He also made it rather clear that it would be a good thing to be able to compare teams. So the consultant coaches were sent back to figure out a way to standardize the estimation points and to figure out how to improve Agile Pilot Team Beta. Team Beta, which had been up until this point highly motivated, naturally became quite perplexed. In the end, the only suggestion from the consultants to the manager was to standardize the way stories are estimated; for example, 1 story point = 2 hours, 2 story points = 4 hours, and so on and so forth and into the abyss goes our ability for agility!
Should story points be converted to hours of effort?
If a specific team chooses to consider the scope of work in terms of hours, that is their choice, but they should be made aware of the flaws with this approach. However, when attempting to standardize across teams, whether to achieve better predictability or to allow teams to be readily compared against each other, this becomes a major agile anti-pattern.
Why is this a bad idea?
True Story 2: A little information can be a dangerous thing
In an unrelated true story, a senior executive got his hands on the data for a master backlog burn-down chart. The original goal of this data and chart was to visualize the progress and a forecast as to when the entire backlog would be delivered; simply divide the total points of the backlog by the cumulative average velocity, extrapolate out the number of sprints, and viola! As you might have already surmised, confidence in the progress was lacking, because this data also showed that the backlog was growing at a faster pace than the burn-down rate.
So the exec had a plan. He would incentivize the delivery of … yes … story points. More Points = Higher Bonus!
"If you reward a team for delivering more points, they will deliver you more points." - Anonymous
The plan was really very simple to calculate: If the three teams maintained their combined average velocity of 82 points, then they would receive their standard target bonus. However, if they increased their combined average velocity by 10% (~90 points in this case) then they would receive a 10% increase to their target bonus. If, however, their combined average velocity decreased, they would lose 1% from their target bonus each percent under the average, up to 10% of it (down to a combined average velocity of 74 points).?
Well, this is quite a fool-proof incentive strategy, is it not? No possible way to game this system, right? (Please excuse my sarcasm.) After hearing of this new plan without having even been consulted, I delivered a carefully crafted warning of the risky repercussions regarding this plan. I will not go into the details of my correspondence, but I’m sure you can think of many points yourself.
I even called another agile coach and mentor of mine and explained what was happening, mostly to get a peer sanity check, and to see if I was missing something. My mentor, without hesitation, said to me: “Have the teams change every user story to 20 points, full stop!” In retrospect, I wish I had taken his advice, but I figured that my reasoning and rationale would prevail. It did not. The team members, to their credit and as expected, figured out that it was quite practical to always favor the higher numbers, based on, you know, unanticipated complexity. Without surprise, they delivered a lot of story points as well. Too bad the same could not be said about measuring the value and quality delivered.?
Estimation Constraints and Biases
While the book Thinking Fast and Slow by Daniel Kahneman is not specific to agile-themes, it is very relevant with respect to the various social-psychological facets of organizational culture.
My intention for this reference is to quickly summarize a few points. (1) When it comes to intuitive thinking, we rely too much on it and it is faulty. (2) When it comes to rational and analytical thinking, we often become lazy and rely on our intuition to save us time and effort. (3) Even when we make the effort to use our rational and analytical thinking, it is very biased and easily influenced.?
Now, for the next exercise, please gather your new scrum development team together and ask them to perform the estimation exercise with these instructions:
领英推荐
If this is not already difficult enough, please ponder the following hindrances?to objectivity:
Ok, so we get it. It is not easy to do, especially with a new team. But eventually, we can get the team to all agree on a final number, right??Right, we can, but often begrudgingly. In the beginning of a new Scrum team, we may have a few motivated individuals who really enjoy lengthy spirited story-points debates, such as why they will not drop from an 8 to a 5, or move up from a 2 to a 3.?But in due time, this routine begins to grow tiresome.?
Estimation Fatigue
Estimation Fatigue occurs when long session arguments drag some members down (e.g., a several-minute debate over whether a Story should be a 3 or a 5). They simply become frustrated with the exercise, which they view as arbitrary and a tedious waste of time. As a result, the motivation of the team declines and many team members become complacent; they quickly acquiesce with the strongest voice in the room or the first person to offer up a number. [iv]
Of course, we can try to coach our way out of the constraints, biases, and fatigue. We can bring in highly respected agile coaching consultants and hire top-notch scrum masters to coach these team members into embracing the user story estimation. Or we can introduce a scaling agile framework that requires 10 weeks of story-point estimation in order to plan our Iteration Loads. Or perhaps we can coach the managers as well. [v]
Here’s a thought. We can listen to the teams and try something new that they might suggest. Dogmatic adherence to a well-established theory or method, without scrutinizing the value-add, is absurd. Daniel Kahneman referred to this bias as “blind respect.” [vi]
Outcomes over Output: The Flow of Value
Here is an exercise for you: query all user stories in your current backlog that are refined and estimated, then find the average number of story points per user story. We can be confident that all the numbers begin to trend toward a central tendency and blur into a large laundry list of related items.
But you may be asking yourself, why does that matter? It matters because we tend to lose sight of the true goal, which is to determine what we should deliver that will present the customer's desired outcome. If my bigger concern is being able to fit the right number of story points into a given iteration (whether it be a sprint or a program increment), then I have lost sight of the value delivery. I have witnessed teams re-plan sprints (within a scaled agile framework) to re-position a user story into a later sprint, simply because the number of estimated points did not fit within their capacity (or what they call the “Load”).
Moreover, we should consider analyzing our prioritization method in terms of how it is influenced. What are the factors that help us determine our prioritization decisions? For example, if we are using numbers, such as the WSJF (weighted shortest job first), are these numbers valid? I ask this because the Effort estimation (Job size/duration), which is also based on the Fibonacci numbers, has a significant influence on these calculation results with which we base prioritization.
When we treat estimated effort and complexity points as factual scientific data, to the point of negatively influencing our delivery of value, we have lost sight of the true objective.
What is the customer asking for? What can we produce in a short time to quickly obtain feedback? These are the questions that we should be asking ourselves and our team members.
Quick tips:
"Our highest priority is to satisfy the customer through early and continuous delivery of valuable software" - AMP 1
AC/DC
I cannot emphasize enough that we should really focus on the AC (acceptance criteria) and the DC (dependencies and constraints)! Have you ever experienced the awkwardness of demonstrating the functionality of a user story to the Product Owner and/or a vested stakeholder, then being called out for not satisfying the AC? I have experienced this, and worse. In one such situation, after being called out, we opened up the AC to review together on the big screen and the developer flat out said that he had never looked at the AC. WHAT?!
That was quite embarrassing. I swore from that day forward that during the sprint planning and again halfway through each sprint, we as a team would re-open each user story and re-review the AC. Then we would also go through the AC both before and after we demonstrated the functionality.
This got me to thinking. I had encouraged my team members to create and modify their tasks based on satisfying the AC. I have noticed that unforeseen AC are discovered occasionally when the team is creating tasks. The two influence each other. Ultimately, the Product Owner wants the AC to be satisfied so that the team can deliver the desired value to the users. More importantly, however, is that the AC should be used to collect feedback and improve the next increment.
If you truly want to size up your backlog user stories, first go through the AC to make sure that they are accurate and comprehensive, and then ask the team if it is possible to deliver all of them within a specific iteration. If the answer is yes, set the User Story as ready to pull. If the answer is no or there is an air of uncertainty, then break up the user story.
AC = Acceptance Criteria / DC = Dependencies & Constraints
The team will eventually learn how many user stories they can commit to within each sprint if the user stories are consistently created with small batch AC and it becomes routine to consider the DC. The team can save the unnecessary time it takes to arbitrarily estimate points by spending that time more wisely on other activities. (If relative estimation is still desired, do so after the AC/DC refinement; perhaps try using T-shirt sizing to avoid the misuse of quantifiable numbers.) When the team decides among themselves to simply pull in their comfortable number of user stories, with other user stories ready to pull in a stretch, you have a self-organized and autonomous team! We can then begin to measure the flow throughput, rather than the number of points completed.
Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done. - AMP5
Conclusion Summary
Working software is the primary measure of progress. -AMP 7
I hope you found this article informative and perhaps entertaining.
Best of luck as always in being and becoming agile!