登录查看更多内容

Metrics Meltdown: The Untold Reality of Software Development Metrics"

Rodrigo Estrada

Master of Science Distributed and Parallel computing | Data Engineering | Platform Engineering

发布日期: 2023年10月28日

Introduction

Measuring performance and progress is crucial in any field, and software development is no exception. When it comes to agile teams, measuring may seem like a straightforward task: just sit down, discuss what you want to measure, and come up with some metrics. However, it's not that simple. In this era, where data engineering is booming, we can apply its lessons to the realm of software development.

Common Metrics in Agile Development

In the realm of agile development, metrics help understand how a team is performing and where it can improve. Some common metrics include:

Sprint Velocity (Scrum): Measures how much work a team can complete in a sprint.
Throughput (Kanban): Measures the amount of work completed in a given time period.
Sprint Failure Rate (Scrum): The proportion of stories not completed in a sprint.
Lead Time and Cycle Time (Kanban): The time it takes from when work begins until it's completed, and the time work actually spends in the process, respectively.

Avoiding Confirmation Bias

The first step is to avoid confirmation bias. If the person who is going to "analyze" the results already knows the data and has a judgment before working on its design, they will only serve to confirm a decision that had already been made or would have been made anyway, in other words, it serves as a mere justification. For example, a manager might have the predisposition that a certain process needs to be changed. If they only focus on the metrics that back up that predisposition, they could ignore other vital metrics suggesting that the current process is effective, leading to decisions that could be detrimental to the team.

Confirmation bias is the tendency to interpret or recall information in a way that confirms pre-existing beliefs or values, ignoring evidence that could challenge them.

Understanding the Null Hypothesis

The first thing here is to understand the concept of the null hypothesis. If metrics are not going to change the default action, they are useless. For this, the first thing is to have clear the default action and here the typical "better than nothing" that is used as an excuse to do improvised things without having clear objectives does not count. In the absence of arguments, hypotheses, and data, perfectly the best decision could be to do nothing, and thinking that doing something without structure is going to improve is a fallacy. This situation can be analyzed under the lens of the Nirvana Fallacy, where a realistic solution is compared to an idealized one, ignoring the feasibility of the latter.

The null hypothesis is a statistical assumption that suggests there is no significant difference between the data sets or variables under study. It serves as a statement to be challenged and refuted to determine the relationship between the variables.

Metrics and Context: A Deeper Look with the Example of Story Points

Metrics in the realm of software development can be of great help, but their true utility is deciphered when they are correctly contextualized. For example, atomic commits are good, but depending on the project, sometimes a larger set of changes is necessary to maintain code coherence. Fewer lines of code may be better for readability and maintenance, but not if that compromises clarity or code functionality. Faster delivery is excellent, but not at the cost of product quality. Fewer bugs are ideal, but are other important features being sacrificed to achieve this?

A case that illustrates well how a metric can be interpreted and used differently depending on the context is that of Story Points (SP) in agile project management. Although Story Points are a valuable metric, their utility lies in a team's ability to apply them consistently and predictably. SPs have no inherent meaning for a particular story or person; instead, they are an abstract concept that acquires statistical sense as they accumulate in a SP database and as long as the team operates in a predictable and consistent manner. This consistency and predictability are crucial for SPs to serve as a useful representation of team velocity.

Pivotal Tracker, for example, employs a methodology that allows teams to choose among various scales (linear, Fibonacci, powers of 2, or a custom model) to assign Story Points to tasks. The idea is to break down stories into as small and consistent tasks as possible to make the team's velocity more predictable. Moreover, it suggests that points should be relative to each other, meaning a point in design should require the same effort as a point in development. This approach seeks to maintain consistency in estimation, which, in the long run, contributes to a more predictable and reliable team velocity.

The methodology of Pivotal Tracker stresses the importance of consistency and contextualization when using Story Points. It calculates the "velocity" of the team based on the average number of Story Points completed over a predefined number of iterations, automatically planning future iterations based on the team's demonstrated capacity.

This approach highlights how a metric, although abstract, can be instrumental in understanding and improving team efficiency when applied and interpreted within the right context. It also underscores how different methodologies can influence the interpretation and utility of a metric, emphasizing the importance of understanding and contextualizing metrics before relying on them for critical decisions.

Navigating the Metrics Maze: A Prelude to Ensuring Data Quality

Before delving into the mechanics of metrics, a pivotal concern arises around the quality of data. Data is the cornerstone upon which the edifice of metrics stands. However, ensuring data quality isn't merely about data capture; it encompasses the decisions, maturity, and discipline of teams. If we operate under the principle of "anything is better than nothing" and hastily generate data and dashboards, the outcome is likely to be counterproductive.

Rushing into metrics generation can lead to a scenario where developers feel harried, and managers might make arbitrary decisions justified by skewed or inadequate data. For instance, a bias towards a developer or a team could be unjustly justified based on hastily assembled metrics. The scenario worsens if incentives are tied to these hastily assembled metrics. It's a well-acknowledged fact that humans, especially developers with a knack for systems, can find ways to game the system.

In such a setup, individuals might channel their efforts towards merely improving their metrics rather than genuinely enhancing the team's performance or solving critical issues. This scenario is not only detrimental to the team's morale but also to the overall project's success. The most committed developers, whose focus is on resolving problems, might find themselves bogged down, and ironically, the lesser competent individuals who game the metrics might shine.

Therefore, the question isn't about the goodness of measuring but what to measure and how. The adage, "What gets measured gets managed," rings true, but a hasty approach towards measurement can lead to a mirage of management, veiling the real issues and diverting focus from genuine improvement.

领英推荐

What is the Agile Manifesto?

WQsoftwares 1 年前

Software Development Trends: Adopting Agile and Lean…

Rexav LLP 8 个月前

How Do We Work?-Agile Software Development Life Cycle

Reactron Technologies Oy 2 年前

This predicament emphasizes the crucial need for a structured approach towards ensuring data quality before embarking on the metrics journey, setting the stage for the subsequent discussion on fostering data quality and team maturity.

Ensuring Data Quality and Fostering Team Maturity

Before diving into metrics, the primary focus should be on generating and capturing quality data. This task is not a walk in the park; it demands a governing structure to incentivize discipline and consistency within teams. Without quality data, any metric derived is likely to be misleading, serving more as a source of confusion rather than clarity. This crucial step requires an investment not just in technological tools but also in nurturing a mature team culture that values data accuracy and integrity. Misinterpreted or misused metrics can lead to arbitrary decisions, which in turn can demotivate developers, often causing a rift between management and development teams.

Defining Clear Objectives: The OKR Framework

Once a robust data foundation is laid, the next step is to define clear objectives and expected results before generating metrics. Whether a metric is positive or negative solely depends on how it contributes to the desired outcomes and objectives; everything else is irrelevant. This clarity is paramount before initiating any measurement activity. Every activity should be measured against these predefined objectives and results. Here, a fundamental discipline emerges: any activity that can't be expressed in terms of objectives and results should either be eliminated or trigger a review of the objectives and results.

The OKR (Objectives and Key Results) framework is a strong ally in this endeavor. It aids in defining, tracking, and measuring objectives alongside the key results expected to be achieved. This framework ensures that the metrics generated are aligned with the goals, providing a clear roadmap for teams to follow. It's not about measuring for the sake of measuring but about understanding how each metric propels or hinders the achievement of set objectives.

Transitioning from Correlation to Causality

At this stage, metrics might reveal correlations, but what's more valuable is understanding causality.

Causality implies a direct relationship between two variables, where a change in one variable causes a change in the other. Correlation, on the other hand, only indicates an association between the variables, without implying causality.

To transition from mere correlation to establishing causality, a substantial historical data is indispensable. Data scientists can sift through this data, seeking correlations and formulating causal relationships to test hypotheses (the defined objectives and results). This step might necessitate incorporating customer-measured data for A/B testing or other causality-establishing methodologies.

With a solid grasp of causality, teams can create dashboards that truly aid in understanding how to enhance outcomes. However, this isn’t a quick-fix solution. It demands time, investment, and a commitment to allow for maturity in the process. Rushing or half-baked attempts are likely to exhaust the best of teams, causing frustration, demotivation, and eventually leading to attrition of valuable talents.

Illustrating the Journey: A Practical Scenario

Let's elucidate the journey from data quality to causality through a hypothetical scenario in a software development company named DevX:

Step 1: Cultivating Data Quality DevX has always been keen on improving its processes. The management decides to invest in tools and training to ensure accurate data collection from the get-go. They establish a governing structure that emphasizes the importance of accurate data logging and consistency across all teams. Over time, the culture within DevX evolves, and a mature, data-driven environment flourishes.

Step 2: Defining Objectives using the OKR Framework With quality data at their fingertips, DevX adopts the OKR framework to align every team’s efforts with the company’s overarching goals. They define clear objectives like reducing bug backlog, improving code maintainability, and enhancing user experience. Key results are set to quantify the success in achieving these objectives, for instance, reducing bug backlog by 30% in the next quarter.

Step 3: Transitioning from Correlation to Causality As the teams at DevX delve into metrics, they initially uncover correlations such as the one between the number of code reviews and the reduction in bug backlog. However, they aspire to understand the causality to make informed decisions. By investing in data science expertise and compiling substantial historical data, they start analyzing the causal relationships. For instance, they find that not just the number, but the quality of code reviews significantly impacts bug reduction.

Step 4: Establishing Causality through A/B Testing To further validate the causal relationships, DevX employs A/B testing. They split their development teams into two groups. Group A continues with the existing code review process, while Group B adopts a more rigorous code review process. The results show a significant reduction in bug backlog for Group B, corroborating the causal relationship.

Step 5: Informed Decision-Making Armed with insights from causal relationships, DevX now makes more informed decisions. They refine their code review process, aligning it closely with their objectives of reducing bug backlog and improving code maintainability. The dashboards are updated to reflect not just correlations, but causality, providing a clearer picture of how their actions impact their objectives.

This scenario illustrates the meticulous journey from ensuring data quality to understanding causality. It emphasizes the patience, investment, and the disciplined approach required to leverage metrics effectively for improved decision-making and ultimately, achieving the defined objectives.

Conclusion

Measuring in software development is a process that requires a significant investment in time and resources. It is essential to adopt a structured approach, clearly define objectives, and ensure data quality before embarking on metric measurement and analysis. Only then will metrics serve as a valuable tool to improve team performance and achieve the desired objectives.

要查看或添加评论，请登录

Rodrigo Estrada的更多文章

Do You Really Need to Suffer with No-SQL and Big Data? ??Be happy ?? and just use PostgreSQL! ??

2025年3月3日

Do You Really Need to Suffer with No-SQL and Big Data? ??Be happy ?? and just use PostgreSQL! ??

Are You Unnecessarily Struggling with NoSQL and Big Data? ?? Many teams are struggling with unexpected costs…

3 条评论
The Two Thinking Styles in the AI Era: Are We Overlooking the Holistic Thinkers? ???

2025年2月26日

The Two Thinking Styles in the AI Era: Are We Overlooking the Holistic Thinkers? ???

In the realm of cognitive diversity, everyone has the capacity for both analytical and holistic thinking. However, for…

4 条评论
The Great Tech Interview Bias: Why Are We Still Ignoring AI in Hiring? ????

2025年2月26日

The Great Tech Interview Bias: Why Are We Still Ignoring AI in Hiring? ????

Job hunting in tech in 2025 is a bizarre experience. I've been leading multi-role teams for years, constantly switching…

4 条评论
LLMs Won’t Kill Software Engineering, Engineers Will Master LLMs

2025年2月9日

LLMs Won’t Kill Software Engineering, Engineers Will Master LLMs

Many developers today have a rather short-sighted view of what an LLM—or even an AGI—will mean for the future. Most see…
The Hyperscaler Illusion: Why Your Internal Developer Platform Needs Modularity, Not Monoculture

2025年1月24日

The Hyperscaler Illusion: Why Your Internal Developer Platform Needs Modularity, Not Monoculture

Let’s talk about Internal Developer Platforms (IDPs). Everywhere I look, companies are racing to build their own…

2 条评论
Beyond Kubernetes: Why Some Applications Are Better Off Without It

2025年1月18日

Beyond Kubernetes: Why Some Applications Are Better Off Without It

Kubernetes (k8s) has become the gold standard for container orchestration, celebrated for its ability to manage modern…

2 条评论
?? High-Performance Digital Teams: Why Most Fail and How to Get It Right

2025年1月11日

?? High-Performance Digital Teams: Why Most Fail and How to Get It Right

The Harsh Reality: Most Digital Product Teams Fail Over the years, I’ve seen far too many organizations attempt to…

2 条评论
Leveraging Multi-Prompt Segmentation: A Technique for Enhanced AI Output

2024年10月31日

Leveraging Multi-Prompt Segmentation: A Technique for Enhanced AI Output

Introduction Have you ever found yourself limited by the token constraints of an AI model? Especially when you need…
Optimize GKE with Custom Images and CAST to Minimize Excessive Image Pulls on 100% Spot Instances

2024年10月30日

Optimize GKE with Custom Images and CAST to Minimize Excessive Image Pulls on 100% Spot Instances

In highly dynamic GKE environments with 100% spot nodes, especially when deploying via platforms like CAST, frequent…

2 条评论
How to Build an Interactive Chat for Your Python CLI Using Introspection, Click, and Rich Formatting

2024年10月21日

How to Build an Interactive Chat for Your Python CLI Using Introspection, Click, and Rich Formatting

If you’ve ever wanted to make your CLI more interactive and dynamic, building a real-time command interaction system…

See all articles

Metrics Meltdown: The Untold Reality of Software Development Metrics"

Rodrigo Estrada

Master of Science Distributed and Parallel computing | Data Engineering | Platform Engineering

领英推荐

Rodrigo Estrada的更多文章

社区洞察

其他会员也浏览了

From chaos to agility: software development guide for startups

Understanding the Difference between Bugs and Production Defects in Agile Projects

Process is King

Designing for Change: The Role of Agile in Software Architecture

The Micro-Product Owner: When a Product Backlog is micro managed

The Evolution of Software Development: From Waterfall to Agile to DevSecOps

Agile v. Traditional Software Development: A Picture Paints a Thousand Words

The Agile Manifesto: Emphasizing Results Over Process

Mastering Swift and Accurate Decision-Making in Software Development. A Guide for Agile Teams

Bridging the Development Methodology Divide: Time to Evolve?

领英推荐

Rodrigo Estrada的更多文章

Do You Really Need to Suffer with No-SQL and Big Data? ??Be happy ?? and just use PostgreSQL! ??

The Two Thinking Styles in the AI Era: Are We Overlooking the Holistic Thinkers? ???

The Great Tech Interview Bias: Why Are We Still Ignoring AI in Hiring? ????

LLMs Won’t Kill Software Engineering, Engineers Will Master LLMs

The Hyperscaler Illusion: Why Your Internal Developer Platform Needs Modularity, Not Monoculture

Beyond Kubernetes: Why Some Applications Are Better Off Without It

?? High-Performance Digital Teams: Why Most Fail and How to Get It Right

Leveraging Multi-Prompt Segmentation: A Technique for Enhanced AI Output

Optimize GKE with Custom Images and CAST to Minimize Excessive Image Pulls on 100% Spot Instances

How to Build an Interactive Chat for Your Python CLI Using Introspection, Click, and Rich Formatting

社区洞察

其他会员也浏览了

From chaos to agility: software development guide for startups

Understanding the Difference between Bugs and Production Defects in Agile Projects

Process is King

Designing for Change: The Role of Agile in Software Architecture

The Micro-Product Owner: When a Product Backlog is micro managed

The Evolution of Software Development: From Waterfall to Agile to DevSecOps

Agile v. Traditional Software Development: A Picture Paints a Thousand Words

The Agile Manifesto: Emphasizing Results Over Process

Mastering Swift and Accurate Decision-Making in Software Development. A Guide for Agile Teams

Bridging the Development Methodology Divide: Time to Evolve?