How would you rank CFPs, Story-Points, Story-Counts, and Story-Days?

I’m already getting ahead of myself – what are story-days?

Well, if Ron Jeffries had it to do over again, he might have introduced story-days instead of story-points. This is just my own personal conjecture, based on story-points being a 20-year experiment gone bad.

I may have invented story points, and if I did, I’m sorry now” – Ron Jeffries

Most of us are already well aware of the problems with story-points. They’re just guesses made by an Agile software development team during a Planning Poker Game - that’s played at the time the team knows the least about the user stories (usually just before the week in which the work on the story will begin).

They’re not even a true metric – they’re just a consensus of guesses by the team, and they are fundamentally flawed. Once there’s two (or more) teams working on the same project, story-points cannot be used (despite what SAFe says about their normalized story-points) to size a project, or to size each story within a project.

One wonders how and why story-points became so popular?

COSMIC: Recently, I’ve discovered that COSMIC Function Points (CFPs) are a true, objective, ungameable metric. CFPs can be used universally, to compare the size of any projects – whether by a single team; by multiple teams working together on the same software project; or even on different software products in different vertical markets, in different environments, using different programming languages, and different productivity tools.

They can be used to estimate the ‘size of the project’ (where a project may be anywhere in the rough range of 2 weeks to 12 months).

Even ongoing development of a long-term software product is reasonably split into multiple projects, where each version of a product may be called a project. Calling a 2-week Scrum sprint a project might be a stretch, although the Scrum Guide does state: “Each Sprint may be considered a short project”. But for the purposes of this article – which includes all Agile software development work that might be timeboxed into short 2-week sprints – calling any reasonable time duration a project works for me.

It’s worth noting here that CFPs don’t estimate the ‘requirements effort’ or ‘estimating effort’ of the project (which is relatively smaller) – just the effort to develop the project’s software (which is relatively larger).

Story-Days: Now back to story-days. Prior to when user stories were the most popular way to define requirements (pre-XP and pre-1998), project estimates used ideal-days and elapsed-days to estimate the scope of the project and its duration. Today, estimating in units of time (hours, days, weeks, sprints, months) is still fairly popular, but maybe less so now that requirements are broken down into epics and stories shortly before the work is done (rather than in a way-up-front Software Requirements Specification).

A story-day is a term that I just coined today. It’s like an ideal-day, but it’s used to estimate the size of a story, rather than to estimate the size of a larger chunk of work. It can be used in the same way that story-points are now used. (Full disclosure: I’ve never actually used story-days, so this is only in theory). The only difference is that the estimators don’t have to obfuscate time when they estimate. Thus, they also don’t need to use reference stories (so story-days are – arguably – easier to use and understand).

Story-day estimators (the dev team members, playing their Planning Poker Game – in a slightly different way) may still mentally compare (for example) a 3-day story to other stories that also took about 3 days, but there’s no reference stories needed to do this (thus no ‘story-point drift’). Drift is when a reference drifts from its original place. The story-days unit-of-measure (UoM) won’t drift.

As @Michael Kusters recently pointed out – devs often think in terms of days anyway, so why even bother with the additional mental gymnastics of converting time to story-points, and then back again to time?

I should also point out – if it’s not already obvious – that a story-day UoM is not the same as a time duration estimate. The story-days UoM still only estimates the size of the story – it doesn’t estimate its completion date.

By using story-days (instead of story-points), a team can still work out their velocity per week – or velocity per sprint – in the usual manner (and thus still be able to estimate the completion dates of the stories, with the only difference being that they’re not using story-points as their UoM.

If story-days are used, I recommend that teams use the same Fibonacci number series used for story-pointing. But I suggest limiting the largest size to be 8 story-days. Any stories larger than 8 should be split. Some teams may even prefer that 5-day or 8-day stories be split as well.

Again, don’t confuse story-days with elapsed days. When estimating in story-days, don’t take meetings or other overhead time into account. Those are accounted for when the work is actually done – when velocity is computed. Velocity already accounts for any team overheads, like meetings or vacation absences.

Unlike story-points, velocity isn’t flawed. As long as velocity has a solid UoM for story-sizing, velocity works well – it’s a good way to determine the pace at which a team works (and every team has a different velocity).

If Ron Jefffries had clarified this, then he might not have needed to invent story-points. There’s no problem with velocity per se. So do you think Jeffries should have just used story-days and velocity?

Keep in mind that story-days are still guesses at how big the stories might be – they’re still highly subjective because they’re still prone to all of the biases (and environmental/contextual influences) that a team might have when they’re guessing each story’s size. Thus, I would personally rank story-days as only marginally better than story-points. Its main advantage is that – ironically – it obfuscates story-points. :)

Also keep in mind that a story-day is not the same as a story-person-day. The only framework that I’m aware of where story-person-days are used is in SAFe – where they calibrate their normalized story points by starting with one story-point per person-day of ‘ideal-time effort’. Except for SAFe, all other types of teams normally mean ideal-team-days when they say that they strive for their stories to fit in (for example) 2-3 days.

Story-Counts: Let’s now discuss story-counts. This is the way #NoEstimates gurus suggest that stories be sized. Every story is deemed to be a size “1”, so they’re sized simply by counting them. How simple! The number of stories done per week (called throughput) are easily counted at the end of each work-week, so there’s less guess-work than with story-points (on the contrary, story-points are never updated after the work is done).

However, when estimated before the work is done, story-counts are still just guesses. And the actual *size* of each story counted really isn’t a “1” - the stories counted might range from as little as single day’s work to as much as a full week’s worth of work. So whether a story has been split to be small enough to fit in (for example) 2-3 days or not is still just a guess – albeit a much easier guess to make than when a team guesses using story-points.

As long as stories are kept small (2-3 days per story), it’s fairly easy to guess how many might be done in the week – the answer is: “about two!” :) It’s out of scope in this article to explain all the reasons why smaller stories are better than bigger stories, but it’s not just so that they’re easier to forecast how many can be done per week.

My impression of why #NoEstimates gurus recommend story-counts is because they’ve determined that sizing stories via story-points is fundamentally flawed, and they consider CFPs to be ‘way more complicated’ - so what option is left? In this respect, followers of #NoEstimates can be considered minimalists.

With the advent of requirements automation tools like ScopeMaster however, I don’t consider CFP counting to be more complicated. In fact – if automated – it’s far easier (and far more objective) than story-pointing.

Story-counts are all that’s needed to determine throughput; draw sprint burndown charts; draw release (or weekly) burnup charts; and draw CFDs (Cumulative Flow Diagrams).

My experiences with story-points and story counts: Several years ago, I did an analysis of burnup charts for about 30 Scrum teams that I worked with – with story-points on the Y-axis for one set of charts, and story-counts on the Y-axis for another set of charts (for the same 30 teams), and the lack of differences in the shapes of the curves convinced me that the extra work involved in playing the Planning Poker Game wasn’t worth the extra effort.

Retrospectively, it would have been interesting to observe the shapes of the curves if the CFP UoM was used. Burnup charts are only as accurate as the UoM on the Y-axis allows it to be (garbage in – garbage out).

Over a decade ago, it was quite popular to estimate in story-points and also break down stories into tasks and estimate the tasks in hours for capacity planning. On one project during this time, I found a very poor correlation of story-points to task-hours. Again, that told me that story-points weren’t a very reliable UoM for estimating effort.

One might argue that some of the differences may have been because of delays due to cross-team handoffs, but our team was a very strong cross-functional team (with many T-shaped devs), and we very rarely had dependencies on any other teams – so that doesn’t account for the observed lack of correlation.

Task-hour estimation errors might also have accounted for the poor correlations, but I trusted my team’s task-hour estimates more than their story-point estimates. This begs the question: Are story-days – and task-days – just more trustworthy than story-points?

Restrospectively it would have been interesting to observe what the correlation might have been between CFPs and task-hours. I suspect that the correlation would have improved.

I’m now getting closer to my over-arching question of “How would you rank CFPs, Story-Points, Story-Counts, and Story-Days?”, but bear with me, while we discuss the remaining sub-topics.

CFPs: Now back to CFPs – COSMIC Function Points. If you followed a couple of my recent posts and articles on my personal LinkedIn feed, you will know that CFPs are a second generation Function Point measurement ISO standard, governed by a standards organization called COSMIC (an acronym for Common Software Measurement International Consortium). CFPs are an objective way to estimate user stories (and other forms and levels of requirements). For the purpose of simplicity, I will only discuss the ‘business application’ level of requirements commonly described within Agile user stories today.

As with story-days and story-points, CFPs can be used as the UoM to compute a team’s weekly (or any other time period) velocity. The only differences are the UoMs, and the accuracies of the UoMs.

Story-points are usually estimated just prior to the week of the work being done, because if they’re done any earlier (and if the project’s requirements change), then the time spent estimating them will have been wasted (and might have to be redone). This is in sharp contrast to CFPs, where there is no disadvantage to estimating the size of the stories too early. In less than a minute per run, the stories can be modified and re-estimated.

Don’t assume that CFP estimating is perfect – it’s not. It’s based on counting logical entities and logical actions (Entries, Exits, Reads, and Writes) implied by the story, and clearly each one of these 4 action types do not require exactly the same effort to develop – they just require ‘approximately the same effort’ (it all averages out).

Furthermore, these ‘logical transactions’ (actions between entities) represent many different ‘physical transactions’ designs (and there’s many software design alternatives, which aren’t considered). So once again, they’re just ‘rough approximations’.

CFP estimates are purposely agnostic to physical implementations – which will be reflected in each team’s CFP-based velocity (thus cost and duration), but that’s a different metric than the teams’ story sizes. CFPs only measure scope – not cost or duration (or even quality – other than the quality of the user stories themselves).

The 3Cs, the 3Rs, and the 3Ws – plus the DoR and DoD: The 3Cs (Card, Conversation, Confirmation) can be a form of unwritten requirements, that are implied by verbal Conversations, but never written down. That way-of-working might be used by some advanced/mature Agile teams on some mature products that are now in ‘support mode’, but it’s not the most common way that most greenfield software products are built (or any other reasonably-sized projects or products – even if they’re not ‘greenfield’).

Furthermore, teams that work by strictly following the 3Cs may not even document their Conversations, and likely don’t even have a strong need to estimate their work – they simply work in a prioritized one-story-at-a-time manner, and don’t need to know the size of a story before they start working on it (as long as it’s reasonably small).

Some teams working in this way don’t even update or save their user stories once the work is done – they simply discard the Cards. This means that they will never know how big their stories were – but that doesn’t matter much to them. So that workflow scenario is out-of-scope for this discussion.

This article is for teams that use written user stories that are well-formed (using the Connextra format), and that write those user stories at any time before the week in which the work is done, or even immediately after the work is done (so that the size of the stories and the size of the entire project - and each story's textual content - can persist for future reference.

But this article excludes any discussion of the stories being updated during the week of the work being done (mainly because that scenario was covered in previous articles and posts, and we need not re-cover old ground).

There’s nothing wrong with mixing the 3Cs with the 3Ws. That is, while the team is collecting requirements from users (the Conversation), they can also write the user stories in a Who What Why format. Feedback from customers on these well-formed user stories (and its Acceptance Criteria) will tell the team whether they’re on the right track or not - in real-time.

There is no need to write the stories too far “up front” - they can be written at any time up to the start of the week when the story will be worked on – following a requirements gathering and feedback process that includes the 3Cs as well as the 3Ws.

Each story must be written, and should [preferably] be “Ready” to start development work on. For most project teams, stories become “Ready” during (or shortly before) the team’s Sprint Planning Meeting (Scrum) or the team’s Replenishment Meeting (Kanban). For the work to proceed, the user story must pass the team’s DoR (Definition of Ready)” criteria.

The reason for a DoR is because – for most projects and most teams – this is the most effective way to work – and it’s the most commonly accepted way-of-working.

Some people think that the Definition of Ready is a stage gate. No it's not. It’s an agreement between those identifying the work and those doing the work that the story is ready to work on.” – @Al Shalloway

Even if those identifying the work are the same people as those doing the work (recommended to some extent – but varies by context), a DoR is still quite useful.

Even if any stories have changing requirements during the week that the work is done, a constraint of this article is that they must be updated as part of the team’s DoD (Definition of Done). Otherwise, the size of the project will remain a mystery, so won’t be able to be compared to other stories done later in the same project (or other projects).

Connextra: Before we move on, what is the Connextra format, and why is it required?Sometimes called the 3Rs (Role, Requirement, Reason) but more commonly called the Who, What, and Why (the 3Ws), the Connextra format (invented in 2001 by a software development team at Connextra Ltd – a company in England) states the story in the vernacular of an end-user (and as told by an end-user) to specify the Who, What, and Why (and striving to avoid the How).

Connextra is not the only format for user stories, but it’s the most popular one. The 3Cs style (Card, Conversation, Confirmation) may have been the first, but the 3Ws remains as the most popular and best way to articulate user stories in written form – that are readily parsed by an automated CFP counter.

Rightly or wrongly, CFPs can be readily computed from user stories by ScopeMaster. They’re called Functional User Stories in the tool – presumably to differentiate any additional details it might have from a more classic – and less detailed – user stories.

User stories are ubiquitous, so we go with the flow. We use what we have. Likewise, I understand that manual CFP counters use any and all project documentation to estimate scope sizes – including user stories. They use what they have. So any discussion of whether user stories are the right documentation vehicle (or not) to specify ‘requirements’ is out of scope for this article. It is what it is.

A word of caution though, if any Agile team members think that they can simply estimate manually using CFPs: I don’t think that they can. IMHO, if any devs have a dual role of writing user stories and estimating the size of the work required to complete the story, the temptation may be too great to dive too far into the How of the story, rather than just specifying the Who and the What. (This may also be the bane of dev teams estimating in story-points). Thus, CFP counters are specialists that work from outside the project teams (or work independently from the rest of the teams).

For some of these reasons, I’ve separated automated CFP counting from manual CFP counting – because their use cases are quite different.

ScopeMaster: Finally, what is ScopeMaster? Instead of me explaining the tool to you second-hand, allow me to quote from their “Automated Function Points Analysis” blog page (https://www.scopemaster.com/blog/automated-function-points/):

Agile User Stories Sized Automatically: At ScopeMaster we took on the challenge of automating function point estimation from written requirements. By employing multiple language analysis techniques, including Natural Language Processing (a branch of artificial intelligence) ScopeMaster is successfully determining a function point size estimate directly from written requirements. ScopeMaster is the world’s first tool to do this.” – @Colin Hammond, Founder

The ScopeMaster blog page summarizes what the ScopeMaster “automated requirements sizing tool” is – far better than I could.

My experiments: I tried ScopeMaster’s free online tool. You can do some basic experiments with https://storytest.co - a clipped but free subset of ScopeMaster.

Trial #1: As a teenage movie-goer, I want to pay for my ticket with cash, because I don’t have a bank account. (Results: 14 secs Run Time; 4 CFPs 79% Quality Score)

Trial #2: As an adult movie-goer, I want to buy my ticket by debit card, because I don’t carry much cash. (Results: 14 secs Run Time; 4 CFPs 79% Quality Score)

Trial #3: As a busy movie-goer, I want to buy my ticket online, so that I don’t have to wait in line at the ticket-counter.(Results: 11 secs Run Time; 0 CFPs 51% Quality Score)

The tool was unable to analyze my third trial, stating "Ambiguous, no clear functional intent". I’m not sure why it couldn’t analyze it, but I do know that it ignores the ‘Why’ part of user stories. Two of the tips given were “Use verbs that infer data movement e.g. validate, save, read, import”, and “Use verbs that are unambiguous in describing movement of data, such as save, store, read, validate, verify, send, publish, display, modify, update, delete”. But it seemed to understand the verb “buy” in my 2nd trial, so why not in my 3rd trial?

I’m also aware that the free tool provides a limited subset of the paid version. For example, the 3 stories here are obviously from the same project, and there would be considerable overlap of functionality when developing these 3 stories.

The free version has no way of knowing this (each trial is completely separate from the rest), but the paid version would have some setup parameters (such as defining the project’s personas and entities), and would avoid double-counting by ignoring duplicate/reusable entities and actions (per COSMIC rules).

I'm left wondering if my third user story wasn't well-formed, or if the free tool lacks some functionality of the full ScopeMaster tool?

Ranking: Finally, how would *you* rank these 5 ways of estimating user stories?

  1. CFPs – using ScopeMaster (or other automated service)
  2. CFPs – using human FP counters
  3. Story-Counts
  4. Story-Days
  5. Story-Points

These are my ranks, but I concede that for smaller projects, #2 may not be cost effective (although it will be more accurate), whereas #1 will always be more cost effective than #2, #4 or #5 (which are all labour intensive).

One final argument re #1 – compared to [especially] #5: CFPs don’t have to be 100 times better than story-pointing to rank higher – they just have to be a little bit better (more accurate/faster/easier). But IMHO they are a *lot* better!

Re #1 and #2, this also begs the question of how long it will be before human CFP counters are automated out of existence? :)

One final question (and perhaps a loaded one): What UoM do you think Ron Jeffries would have used and recommended 20 years ago, if automated CFP counts were available back then – and if he knew that story-points would turn out to be flawed? :)

Thanks in advance for any comments, questions, or opinions that you may have regarding this article. Hopefully, you enjoyed reading it, and you learned more about this topic.

– Kirk Bryde (June 18, 2021) – 

#CFP

#cosmic

#ScopeMaster

#FunctionPoints

#Agile

#agilesoftwaredevelopment

#agileprojectmanagement

#agileprogrammanagement

Nirav D. Gajerawala

Program Manager at Altera Digital Health

3 年

Insightful!!

Craig Imlach

Scrum Master - NAB

3 年

Story-days, hmmmmm … not sure I 100% subscribe to this. In one or two roles we used the term ‘work chunk’ or ‘focus period’ - in each role we were using a variation of the promodo technique where we created work periods dedicated to ‘productive work’ (or focus times). So we started estimating work by these periods as we knew how many of these work chunks we had scheduled in the week. In Organisation 1 we used 60 minutes chunks (55 minutes work and 5 minute break) with 4 chunks defined per day. In Organisation 2 we divided the day into four work chunks with the 10:30am - 12:00pm and 1:00pm to 3:30pm being the ‘productive work’ periods. This was based on productivity research that indicated a person is able to do 15-20 hours of deep focus work per week, and we planned for the start and end of days for planning, experiments, learning, waiting for unit and integration test to complete. Both teams started to break down stories into ‘work chuck’ sizes simply as they found finishing an item in the work chunk less mentally draining and more satisfying. For me, the best thing was the mental coherence that developed across the team and the little rituals the teams developed to get in the groove.

I will throw this out as a rhetorical question, but I have long wondered why there is all this effort put into estimates about time and effort to do work and none put into estimating the value to result from what is built. I argue that it is the latter that is important to the organization.

回复

would there be much discussion about this if: 1) estimation took 75-90% less time? 2) management understood this was a learning tool and not a forecast 3) estimates were used only to compare to the past and never used to beat up teams? The first can be readily achieved by using Steve Bockman's team estimation.? the second are lessons from Lean.? The fact that planning poker, an outdated, cumbersome method that was created when individual teams were working on relatively small projects, is still being used says a lot about learning in our industry. If you're going to estimate, I strongly recommend using Team Estimation by steve bockman

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了