The Conundrum of Estimating Story Points
Fibonacci Spiral (Image by Gerd Altmann on Pixabay)

The Conundrum of Estimating Story Points

Published 25 Aug 2022

Conundrum definition

For many noteworthy reasons, I have grown very wary of estimating work items with points. In this article, I simply wish to present some head-scratching true stories along with some alternative ideas.

Why must we estimate at all?

The original purpose of estimation in traditional projects was simply to forecast delivery milestones. Project managers solicit LOE (Level of Effort) in hours from the project team (developers, quality assurance, etc) in order to forecast the project duration and then provide this data to the PMO Gate Committee. This rarely proves even remotely accurate. It is also a very frustrating activity for the project team, who are afraid of being held accountable to such numbers, so they often pad the numbers.

In Scrum, the purpose is to help teams improve their ability to commit to finishing an increment within a time boxed iteration. An additional benefit is that the exercise of estimation itself helps elicit the discussions about the true scope of work (i.e., “What are we really talking about here?”)

Relative Estimation: The Fibonacci Sequence

“Fibonacci numbers appear unexpectedly often in mathematics. They also appear?in biological settings, such as branching in trees,?the arrangement of leaves on a stem, the fruit sprouts of a?pineapple, the flowering of an?artichoke, an uncurling?fern, and the arrangement of a?pine cone's bracts.” [i]

Fibonacci sequence in nature examples


Fibonacci and modified Story Points compared

Using the modified version of the Fibonacci sequence, team members can estimate the size of a work item (typically a user story) in relative comparison to other work items. The key term here is relative size comparison of work. This is often where the confusion sets in, as our old friend, the industrial-era-project-management-mind-set cannot refrain from the lure to standardize everything and achieve the Holy Grail of project management ... perfect predictability!

Personally, I find the phenomenon of the Fibonacci sequence in nature intriguing. Furthermore, the fact that a modified version of it was incorporated into an estimation technique in Scrum was, at the time, quite innovative. However, as I put forward in this article, the potential for misuse of the data, plus the blind adherence?to the validity of the numbers, has increasingly put the focus on the customer and the flow of value at risk.

The Potential for Misuse of Data

Just to be blatently honest (foregoing all subtlety here), based on more than 12 years of experience, I declare that too many executives and managers simply do not understand story points. And try as we may to maintain the resulting data for its intended purpose, they find ways of obtaining and misusing the data. What follows are two very true examples.?

True Story 1: Agile Pilot Team Alpha crushes Agile Pilot Team Beta

During the beginning of an “agile transformation” I was asked to be the Product Owner of our very first Agile Pilot team. The Agile Champions hired a coaching consultancy firm to teach our team “the ways of Agile”. After we had created our backlog (which was a painstaking endeavor) our coach had us pick a random user story from our backlog. We did. Then he told us to assign it 8 points. We did. The problem, as we would soon discover, was that this randomly picked user story ended up being a very simple user need, something like, “As a user, I need to have a hyperlink from the home page to the third-party search portal, so that I can easily navigate there.

He then instructed us to estimate all other user stories in the backlog with a value of points relative to the complexity and effort of this first hyperlink user story. So over the course of the next few hours, we estimated all user stories on the backlog, most being higher in complexity and effort.?

Random activities often lead to random chaos, just not necessarily immediately thereafter”. - Me


As you may have guessed, Agile Pilot Team Alpha had no user stories with fewer than 8 points (no 1s, no 2s, no 3s, and no 5s). The majority of our user stories were estimated at 13 or 20 points. We even had a couple of 40s. Fast forward in time, by our fifth sprint, we were rocking 70 to 90 story points per sprint. Our average velocity, which was relative to our reference point of course, was around 78 points. But the high number is trivial without a reference point and in our situation the reference point for an “easy” user story was 8 points. So it should not matter, right? Something about working software being the primary measure of progress, as we were told.?

Based on our success, another Agile Pilot Team (Team Beta) was “launched” (much to the delight of the coaching consulting firm, who so loved using the rocket metaphors). However, this team did not make the same choice of picking an easy user story as their first reference point. Their backlog, contrary to ours, had plenty of user stories that were estimated in the range of 1 to 5 points. In no time, the highly motivated Team Beta was also rocking the value delivery. However, their average velocity was much lower than Team Alpha; it was less than half of Alpha Team’s average velocity. In fact, Team Beta finished roughly 30 points per sprint. But as stated, it’s all relative. I mean, one does not need to be an Einstein to understand the theory of relativity in this context.

Well, somehow, in some form, a manager got a hold of this velocity information. (I think a consultant made an ill-informed decision to include the data on a couple of PowerPoint graphs during a status report). The question was raised by this manager, why is Team Alpha so much more productive than Team Beta? The consultants, correctly so, explained the rationale of relative estimations and why we should not compare the two teams with each other; rather, the slides was merely included to show, within the context of each team, how consistently they were performing, which is in accordance with the Agile Manifesto principle, "Agile processes promote sustainable development…".

Bar graph comparing the velocity of two teams

But paid consultants hold steadfast to their principles only as is prudent before they become anxious about their gig. This manager insisted on finding out why one team was so demonstrably outperforming the other. He also made it rather clear that it would be a good thing to be able to compare teams. So the consultant coaches were sent back to figure out a way to standardize the estimation points and to figure out how to improve Agile Pilot Team Beta. Team Beta, which had been up until this point highly motivated, naturally became quite perplexed. In the end, the only suggestion from the consultants to the manager was to standardize the way stories are estimated; for example, 1 story point = 2 hours, 2 story points = 4 hours, and so on and so forth and into the abyss goes our ability for agility!

Should story points be converted to hours of effort?

If a specific team chooses to consider the scope of work in terms of hours, that is their choice, but they should be made aware of the flaws with this approach. However, when attempting to standardize across teams, whether to achieve better predictability or to allow teams to be readily compared against each other, this becomes a major agile anti-pattern.

Why is this a bad idea?

  • Rather than promoting autonomous teams, it promotes scrutinizing behavior and unwarranted competition that corrupts the true purpose of the estimation in the first place. This leads to outside micro-management of the teams.
  • It negates the original intention of the relative sizing, i.e., “Team Estimation”.
  • Essentially, only the subordinate tasks are granular enough to understand effort in hours, (and I do not recommend burdening?the teams with this type of overhead).
  • Finally, it is completely invalid! One cannot universally compute Fibonacci numbers into working hours across the vast complexity of portfolio scope. Even within a single team, this conversion technique is flawed for reasons of bias discussed later.

True Story 2: A little information can be a dangerous thing

In an unrelated true story, a senior executive got his hands on the data for a master backlog burn-down chart. The original goal of this data and chart was to visualize the progress and a forecast as to when the entire backlog would be delivered; simply divide the total points of the backlog by the cumulative average velocity, extrapolate out the number of sprints, and viola! As you might have already surmised, confidence in the progress was lacking, because this data also showed that the backlog was growing at a faster pace than the burn-down rate.

So the exec had a plan. He would incentivize the delivery of … yes … story points. More Points = Higher Bonus!

"If you reward a team for delivering more points, they will deliver you more points." - Anonymous

The plan was really very simple to calculate: If the three teams maintained their combined average velocity of 82 points, then they would receive their standard target bonus. However, if they increased their combined average velocity by 10% (~90 points in this case) then they would receive a 10% increase to their target bonus. If, however, their combined average velocity decreased, they would lose 1% from their target bonus each percent under the average, up to 10% of it (down to a combined average velocity of 74 points).?

Line graph showing average points velocity and threshholds

Well, this is quite a fool-proof incentive strategy, is it not? No possible way to game this system, right? (Please excuse my sarcasm.) After hearing of this new plan without having even been consulted, I delivered a carefully crafted warning of the risky repercussions regarding this plan. I will not go into the details of my correspondence, but I’m sure you can think of many points yourself.

I even called another agile coach and mentor of mine and explained what was happening, mostly to get a peer sanity check, and to see if I was missing something. My mentor, without hesitation, said to me: “Have the teams change every user story to 20 points, full stop!” In retrospect, I wish I had taken his advice, but I figured that my reasoning and rationale would prevail. It did not. The team members, to their credit and as expected, figured out that it was quite practical to always favor the higher numbers, based on, you know, unanticipated complexity. Without surprise, they delivered a lot of story points as well. Too bad the same could not be said about measuring the value and quality delivered.?

Estimation Constraints and Biases

While the book Thinking Fast and Slow by Daniel Kahneman is not specific to agile-themes, it is very relevant with respect to the various social-psychological facets of organizational culture.

My intention for this reference is to quickly summarize a few points. (1) When it comes to intuitive thinking, we rely too much on it and it is faulty. (2) When it comes to rational and analytical thinking, we often become lazy and rely on our intuition to save us time and effort. (3) Even when we make the effort to use our rational and analytical thinking, it is very biased and easily influenced.?

Now, for the next exercise, please gather your new scrum development team together and ask them to perform the estimation exercise with these instructions:

  1. Estimate using a modified Fibonacci number system, which many probably have never seen before being introduced to scrum.
  2. Be objective when estimating.
  3. Consider all work that must be done, not just your own work.
  4. Estimate relative to the other work items, meaning you need a reference point.
  5. Estimate based on complexity and effort, but mostly complexity … I mean mostly effort … I guess it depends on who you ask. [ii]
  6. Please don’t forget to consider the level of uncertainty and risk.
  7. Now on the count of three, show me your cards; one … two … umm … somebody showed their card too soon!

Planning Poker Cards

If this is not already difficult enough, please ponder the following hindrances?to objectivity:

Perceptions of an elephant

  • All of us have inherent biases with respect to complexity and effort, which can even change within ourselves from day to day, including our tendency to be over-confident in the beginning.
  • Most teams struggle to identify all risks and anticipate unknown events, especially when external dependencies exist.
  • Many team members come from distinctly different backgrounds and cultures, and therefore, view work differently.
  • Many teams have external members who have used other estimation techniques in past assignments, which influence their input.
  • Many backlogs have user stories that are poorly refined and not ready for prime time.
  • Many teams are not truly cross-functional; rather they are staffed with SMEs who handle a specific part of the team’s focus and often will not participate in the estimation that is unrelated to their own work. Moreover, many teams split up programming and testing tasks, which has a major impact on one’s view of the amount of effort. [iii]

Ok, so we get it. It is not easy to do, especially with a new team. But eventually, we can get the team to all agree on a final number, right??Right, we can, but often begrudgingly. In the beginning of a new Scrum team, we may have a few motivated individuals who really enjoy lengthy spirited story-points debates, such as why they will not drop from an 8 to a 5, or move up from a 2 to a 3.?But in due time, this routine begins to grow tiresome.?

Estimation Fatigue

Estimation Fatigue occurs when long session arguments drag some members down (e.g., a several-minute debate over whether a Story should be a 3 or a 5). They simply become frustrated with the exercise, which they view as arbitrary and a tedious waste of time. As a result, the motivation of the team declines and many team members become complacent; they quickly acquiesce with the strongest voice in the room or the first person to offer up a number. [iv]

Of course, we can try to coach our way out of the constraints, biases, and fatigue. We can bring in highly respected agile coaching consultants and hire top-notch scrum masters to coach these team members into embracing the user story estimation. Or we can introduce a scaling agile framework that requires 10 weeks of story-point estimation in order to plan our Iteration Loads. Or perhaps we can coach the managers as well. [v]

Here’s a thought. We can listen to the teams and try something new that they might suggest. Dogmatic adherence to a well-established theory or method, without scrutinizing the value-add, is absurd. Daniel Kahneman referred to this bias as “blind respect.” [vi]

Outcomes over Output: The Flow of Value

Here is an exercise for you: query all user stories in your current backlog that are refined and estimated, then find the average number of story points per user story. We can be confident that all the numbers begin to trend toward a central tendency and blur into a large laundry list of related items.

But you may be asking yourself, why does that matter? It matters because we tend to lose sight of the true goal, which is to determine what we should deliver that will present the customer's desired outcome. If my bigger concern is being able to fit the right number of story points into a given iteration (whether it be a sprint or a program increment), then I have lost sight of the value delivery. I have witnessed teams re-plan sprints (within a scaled agile framework) to re-position a user story into a later sprint, simply because the number of estimated points did not fit within their capacity (or what they call the “Load”).

Moreover, we should consider analyzing our prioritization method in terms of how it is influenced. What are the factors that help us determine our prioritization decisions? For example, if we are using numbers, such as the WSJF (weighted shortest job first), are these numbers valid? I ask this because the Effort estimation (Job size/duration), which is also based on the Fibonacci numbers, has a significant influence on these calculation results with which we base prioritization.

When we treat estimated effort and complexity points as factual scientific data, to the point of negatively influencing our delivery of value, we have lost sight of the true objective.

What is the customer asking for? What can we produce in a short time to quickly obtain feedback? These are the questions that we should be asking ourselves and our team members.

Quick tips:

  • When planning a sprint, do not let the number of story points or the calculated WSJF detract from the most desired outcome. Pay attention to the flow of value, the feedback loop, and question the validity of the prioritization calculation numbers.
  • An alternative option would be to take the more highly valued user story and break it down into its pieces (see next AC/DC section).
  • Above all, learn to improve creating customer-centric Sprint Goals and focus on those goals!?

"Our highest priority is to satisfy the customer through early and continuous delivery of valuable software" - AMP 1


AC/DC

AC/DC logo

I cannot emphasize enough that we should really focus on the AC (acceptance criteria) and the DC (dependencies and constraints)! Have you ever experienced the awkwardness of demonstrating the functionality of a user story to the Product Owner and/or a vested stakeholder, then being called out for not satisfying the AC? I have experienced this, and worse. In one such situation, after being called out, we opened up the AC to review together on the big screen and the developer flat out said that he had never looked at the AC. WHAT?!

That was quite embarrassing. I swore from that day forward that during the sprint planning and again halfway through each sprint, we as a team would re-open each user story and re-review the AC. Then we would also go through the AC both before and after we demonstrated the functionality.

This got me to thinking. I had encouraged my team members to create and modify their tasks based on satisfying the AC. I have noticed that unforeseen AC are discovered occasionally when the team is creating tasks. The two influence each other. Ultimately, the Product Owner wants the AC to be satisfied so that the team can deliver the desired value to the users. More importantly, however, is that the AC should be used to collect feedback and improve the next increment.

If you truly want to size up your backlog user stories, first go through the AC to make sure that they are accurate and comprehensive, and then ask the team if it is possible to deliver all of them within a specific iteration. If the answer is yes, set the User Story as ready to pull. If the answer is no or there is an air of uncertainty, then break up the user story.

AC = Acceptance Criteria / DC = Dependencies & Constraints

  • PO and team together write and understand all AC.
  • Create and align team tasks to fulfill the AC.
  • Revisit the AC during the Sprint.
  • Identify all dependencies before pulling the work and explore options to reduce them.
  • Take into consideration all constraints per user story. (e.g., limitation on the number of clicks necessary to achieve the desired outcome,?or no pop-up windows allowed).
  • Consider the risks to commitment that these cause.

The team will eventually learn how many user stories they can commit to within each sprint if the user stories are consistently created with small batch AC and it becomes routine to consider the DC. The team can save the unnecessary time it takes to arbitrarily estimate points by spending that time more wisely on other activities. (If relative estimation is still desired, do so after the AC/DC refinement; perhaps try using T-shirt sizing to avoid the misuse of quantifiable numbers.) When the team decides among themselves to simply pull in their comfortable number of user stories, with other user stories ready to pull in a stretch, you have a self-organized and autonomous team! We can then begin to measure the flow throughput, rather than the number of points completed.

Build projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done. - AMP5


Conclusion Summary

  • Estimation in Scrum is a tool to help teams understand and improve their ability to commit to finishing an increment within a time box, but the true redeeming quality of the estimation exercise is that it helps elicit the discussions about the true scope of work in a user story.
  • Using Fibonacci numbers as story points is one of several methods of estimation. The validity and value it provides is highly questionable and often negated by the team’s required investment of time, especially if the members begin to grow weary of the estimation technique.
  • In the wrong hands, story points and velocity can be misused, resulting in demoralized teams. Development teams should never be compared against each other in terms of performance data. Delivering story points should never be incentivized.
  • Estimation is an art that most teams eventually acclimatize into their ecosystem. [vii]
  • As a team grows and matures together, they will gradually and instinctually understand the amount of work that they can commit to, without using arbitrary numbers, which are not scientific, and furthermore can corrupt their collective intuition.
  • Focus on Sprint Goals and the AC/DC (Acceptance Criteria / Dependencies & Constraints)!

Working software is the primary measure of progress. -AMP 7


I hope you found this article informative and perhaps entertaining.

Best of luck as always in being and becoming agile!

Endnotes

No alt text provided for this image

#estimation #useradoption #valuedriven #agile #agilemindset #agilecoaching #flow

要查看或添加评论,请登录

Jeff Himmelright的更多文章

  • The Demise of the Product Owner

    The Demise of the Product Owner

    Are certain agile scaling frameworks endangering the role of the Product Owner? Or is it rather an organizational…

  • Humanism over Nationalism: An Agile Approach to Dealing with Pandemics

    Humanism over Nationalism: An Agile Approach to Dealing with Pandemics

    by Jeff Himmelright Waiting for WhatsApp Messages On Friday the 13th of March 2020, my wife and I were nervously…

    15 条评论
  • The Agility of Apollo 13

    The Agility of Apollo 13

    An Interactive Training Exercise by Jeff Himmelright First published on 5 July 2016 I often associate the artistry of…

  • Kaizen and the Art of Agility

    Kaizen and the Art of Agility

    Turning Challenges into Goals by Jeff Himmelright First published on 11 November 2016 “You look at where you’re going…

  • The Players’ Game

    The Players’ Game

    A Case for Empowering the Product Teams by Jeff Himmelright First published on 20 July 2017 “To manage is to control.”…

    1 条评论

社区洞察

其他会员也浏览了