Measuring the Business Value of GitHub Copilot
Copilot Impact Modeling

Measuring the Business Value of GitHub Copilot

The most common benefit Developers see from the use of GitHub Copilot is time savings.

It's easy for Developers to quantify the Time Saved from using Github Copilot by reflecting on work completed and making a counterfactual estimate of the percentage impact. Copilot made the Task 10% faster or, the task would have taken twice as long without Copilot (50% faster).

Developers use copilot in many different ways and for different subtasks and that leads to different levels of impact.

Example Counterfactual Estimates from developers:

  • 11-20% Time Savings: “Asked Copilot Chat how to use a specific command and got my answer—no need to search through CLI documentation.”
  • 21-30% Time Savings: “For about half of this PR, I completed the first quarter manually, then told GitHub Copilot, ‘See what I did at lines M–N in this file? Do that for…’ It wrote the next quarter of the code for me.”
  • 31-40% Time Savings: “Copilot helped format large data sets for a test—something that would have taken much longer to do manually.”
  • More than 41% Time Savings: “Was able to tab out the entire process.”

We can cross-check, or validate, these developer reported estimates many different ways.

1) The first way is comparing to the results from controlled studies. The graphic below shows what researchers find when they compare how long it takes AI assisted users and non-AI assisted users to complete the same tasks:

  • The GitHub 2022 study found Copilot can reduce task duration 55% on average.
  • The McKinsey study shows that the Time Savings varies by Task, but even complex tasks see up to 10% time savings.
  • The MIT study shows that certain subtasks (Writing Rough Drafts) see more time savings than other subtasks (Brainstorming). It also showed that some subtasks (Editing) actually use more time.

If we assume developers spend about 30% of their time coding, then we can estimate hours saved per week from their counterfactual estimates of time savings. The graphic below is one of many examples of research finding that Devs spend about 30% of their time with Code.



Simple math shows that 20-50% time savings (within 30% of a 40 hr week) is the same as 2.4-6 hrs per week. See other possible combinations and the resulting hrs saved per week in the graphic below:


Copilot Time Savings Lookup Table


We can see that the numbers reported by Developers, the findings of Researchers, and the relationship of Time Spent Coding and Time Saved all tend to tell the same story. Dev's are able to save 20-50% of dev task time by using Copilot. This impact level is consistent between controlled Studies and time savings reported as part of day to day work. This percent of time saved adds up to 2 - 10 hrs saved per week.

2) The second way to validate the developer estimates is by comparing the level of time savings estimated and the type of use reported. When we stack rank and group the type of usecase/scenario by level of impact we can easily see that users that are getting 10-20%, time savings, for example, are using copilot very differently than those getting 40-50% time savings.

Developers reporting similar impact and similar use-cases are essentially confirming each other's reports.

3) The last way to validate the self reported Time Savings is by comparing with throughput measurements like PR rate or Story-point velocity. Due to the many factors that can affect throughput metrics, only with careful controls for many factors can development orgs measure the Copilot with throughput. Such organizations need to have highly disciplined and efficient SDLC processes as well as room for growth. In such rare situations it has been possible to see estimates of Time savings to be consistent with the throughput improvement measured.

For example, in one case developers estimated an average level of timesavings of 6 hrs per week. During this time period, the org's dashboard measured an increase in Storypoints of 9% for one and 14% for another. On a side note, the orgs that are able to link Copilot to throughput are typically very disciplined and efficient in their dev practices. They also tend to have sophisticated, mostly homegrown, mature platforms for capturing and normalizing activity data.

Why can't we rely on Throughput measurements to capture Copilot Time Savings?

There are many, many reasons. First, many factors affect throughput such as the number of productive work hours per week due to holidays, vacation, outages, code freezes, end of quarters, pending deadlines, etc.

Beyond these external factors, we also can't rely on throughput measures for internal reasons. The graphic below shows that the same level of copilot time savings can be large and noticeable for a Developer, but "gets diluted" when measured at the Team level and is essentially invisible (1-3% level of impact) when viewed through an end-to-end measure like "Deployment Frequency".


The above graphic makes it clear that the value of AI to software development is at the Developer level and not at the Process or Organization level. So, why even bother with adopting GitHub Copilot?

Simple, it's the best business case you will ever see. With over 4000% ROI, no business can afford to let this opportunity pass them by.... despite the insignificant throughput improvement.


The Business Case for GitHub Copilot for 1000 Devs coding 40% of the time

In addition to tracking Developer Time Savings, other dimensions can be included to build a "causal-model" of improvement. The causal model connects the factors that lead to developers creating downstream impacts with Copilot.

It recognizes that Devs must first adopt the tool, achieve consistent activity, achieve consistent time savings, and deliberately allocate these savings towards downstream outcomes. This translates to the following ROI Roadmap.


The ROI journey for 90% of 1000 Devs

When adoption, activity, time savings? all reach their Targets, it is then reasonable to expect that downstream impact is happening at the target level as well.

In future posts, we'll explore the roadmap in more detail and what learnings are needed by leadership and by developers to maximize impact.

Dimitar Bakardzhiev

Efficient Product Development

2 周

I am confident AI code assistants are the future, but this makes no sense: "With over 4000% ROI, no business can afford to let this opportunity pass them by.... despite the insignificant throughput improvement." Why? Because businesses sell not the time saved at Dev level but software produced at Company level. Companies cannot calculate ROI on Dev level because there is no "Return" there. To calculate "Return" on "Investment", where Copilot is the investment you need the Throughput. When developers learn how to use AI code assistants then Throughput will also increase.

要查看或添加评论,请登录

Matt Gunter的更多文章

  • A case for Bayesian Reasoning

    A case for Bayesian Reasoning

    The book "Everything Is Predictable" by Tom Chivers provides a compelling argument for the superiority of Bayesian…

  • How AI Code Assist Tools Create Value

    How AI Code Assist Tools Create Value

    Before we can know if a new tool or practice or process is helping we have to anticipate what advantage or leverage it…

    6 条评论
  • An Inspiring Story of Repair, Improvement, Surprising Possibilities...

    An Inspiring Story of Repair, Improvement, Surprising Possibilities...

    ?? Watch The Last Repair Shop An Inspiring Short Film That Challenges Our Understanding of Systems ?? Theme: This…

    1 条评论
  • Three Ways Throughput Can "Transform" Your Business: A Satirical Allegory

    Three Ways Throughput Can "Transform" Your Business: A Satirical Allegory

    The moral (and humor) in this story is that: Structure matters. Coordination determines what structure is possible.

    9 条评论
  • Measuring more but learning less

    Measuring more but learning less

    Driving continuous improvement and making better decisions is something I think everyone can agree on. If individuals…

  • Four Ways to Fail at improving software development

    Four Ways to Fail at improving software development

    Rely on Activity Metrics and Promote the Idea that More Activity is More Valuable. Focusing on activity metrics (e.

  • Average Limitations

    Average Limitations

    When averages misinform and mislead —precision, causality, and predictability provide a repeatable path to better…

  • The Misguided Focus on Throughput in Knowledge Work

    The Misguided Focus on Throughput in Knowledge Work

    In the world of manufacturing, the Theory of Constraints (ToC) has long been a cornerstone of improving efficiency and…

    84 条评论
  • Maximizing Outcomes with AI

    Maximizing Outcomes with AI

    In a world where automation (AI enabled tools) handle an increasing number of tasks, human decision-making remains…

    1 条评论
  • Rediscovering Agency...

    Rediscovering Agency...

    Depicting individuals who were usually isolated and disconnected from their environments, in the Nighthawks Hopper…

    1 条评论