How to measure the impact of GitHub Copilot on engineering productivity
Most software delivery teams are considering adopting AI in some form to help engineers accelerate their value delivery and increase delivery effectiveness.?
GitHub Copilot is one of the first examples of AI-powered engineering assistance.? It is self-styled as “your AI pair programmer’ and can autocomplete lines of code.?
Early adopters report ‘productivity improvements’ of up to 20% using GitHub Copilot. Still, it is not cheap (at c$10 per user per month), so how can you build a business case for GitHub Copilot??
As such, what methodology (and metrics) should you use to accurately assess the impact on productivity and value delivery of a tool like GitHub Copilot??
GitHub Copilot: first-generation AI assistance for software engineers?
Copilot is clearly the start of a long and accelerating journey as AI is applied to many areas of the SDLC.? It uses the OpenAI Codex to suggest code and entire functions in real time from your editor.?
Other AI tools arrive almost daily and can help software engineers in a myriad of time-saving and efficiency-enhancing ways.? Here are just a few examples:?
- Grit.io will help manage technical debt?
- Mintlify provides automated documentation for developers?
- Code AI helps translate (some) languages, debug, navigate code and act as a pair programmer?
- Tools like AdrenalineAI use AI to improve understanding of your codebase?
So, at a time when cost control is the order of the day, you may be considering how you can accurately quantify the impact of tools like these and justify the added expense.?
A methodology for measuring the impact of a tool like GitHub Copilot?
To robustly measure the impact of GitHub Copilot (and similar AI engineering-enhancement tools), the methodology must be:?
- Quantitative - based on hard, measurable data?
- Holistic – considering all benefits and potential impacts across the end-to-end SDLC (software delivery lifecycle)?
- Balanced – inclusive of subjective survey data alongside software delivery data?
This requires a metrics scorecard that fully captures the benefits and potential costs of GitHub Copilot.?
The metrics reflect the SPACE framework for measuring developer productivity, emphasising the key areas GitHub Copilot will likely impact.??
These metrics can be tracked over time for a representative group of GitHub users to see the ‘before and after’ effect.? We suggest that a representative sample would include engineers of different seniority and activity – and the time period for analysis would be at least three sprint cycles (e.g. 6 weeks+).?
Key metrics to quantify the impact of GitHub Copilot?
An end-to-end software delivery analytics platform like Plandek provides a single pane of glass to measure the real impact of a tool like GitHub Copilot.?
It surfaces a range of engineering and software delivery metrics to capture the impact of GitHub Copilot on five key variables that determine ‘productivity’:?
- Velocity and throughput – measures of team ‘output’?
- Time to value – time taken to deliver an increment of software?
- Quality??
- Dependability – a key benefit if teams more reliably deliver against their plans.?
- Developer satisfaction - impact on speeding up repetitive/less interesting tasks
These metrics can be tracked over time for a GitHub Copilot control group versus non-users.?
Velocity and throughput metrics
Throughput is a core measure of ‘output’ over time for Scrum and Kanban teams - and can be calculated in tickets, story points, pull requests, builds or value points.? A tool like Plandek will easily calculate Throughput per engineer for users and non-users of GitHub CoPilot.? This can be expressed as a percentage increase.?
Sprint Velocity considers the rate of work achieved within a sprint and how it varies over time.? It can be calculated in tickets or story points.? Advanced analytics tools like Plandek will also show you the amount of work carried over by Sprint to see an even better underlying measure of delivery.?
This would be a key metric when considering the impact of GitHub Copilot.?
?
2. Time to Value
Cycle Time is a core agile software delivery metric which tracks an organisation’s ability to deliver software early and often.? It calculates the time taken to deliver an increment of software from dev start to deployment.? The shorter the Cycle Time, the shorter the feedback loops, hence the quicker the organisation is going to receive new features and respond to customer needs.? This is a vital KPI when assessing technology delivery efficiency.?
Code Cycle Time typically accounts for 20-30% of overall Cycle Time. It calculates the average time taken from a pull request (PRs) opening until it is merged/closed. The bulk of this time is usually spent during the approval process.???
In theory, GitHub CoPilot enables quicker, easier development. Therefore, developers should have greater availability to review each other’s PRs. If code quality is improved, then the outcome of the reviews should result in fewer changes requested and an approval time.
3. Quality?
Escaped Defects is a simple but effective measure of overall software delivery quality.? It can be tracked in numerous ways, but most involve tracking defects by criticality/priority.???
领英推è
Any analysis of delivery efficiency pre/post the implementation of GitHub Copilot should include consideration of Escaped Defect rates as it would be a poor trade-off to increase velocity and ‘productivity’ at the expense of quality.? ?
Build Failure Rate identifies the percentage of builds which fail and the overall risk this poses to a team working productively. Notable changes to the failure rate after implementing GitHub Copilot indicate that code quality may be impacted.?
Dependability?
Sprint Target Completion tracks the percentage of the sprint goals achieved each cycle. ‘Scrum Teams’ and ‘Sprints’ are the basic building blocks of Scrum Agile software delivery.? If Scrum Teams consistently deliver their Sprint goals, Agile software delivery becomes relatively dependable, enabling the prediction of delivery outcomes across multiple teams and longer time periods.???
Scrum team predictability is, therefore, a critical success criterion in Agile software delivery. If GitHub Copilot can improve the likelihood of a team delivering their tickets faster and with fewer bugs, then this is a major contributor to the overall improvement in effectiveness.?
Developer Satisfaction?
eNPS tracks employee satisfaction and loyalty within teams and organisations. Anecdotal reports suggest that developers find that GitHub Copilot makes the more tedious aspects of coding less taxing and positively impacts wellbeing.? An employee NPS makes this straightforward to validate and quantify.
Although an important factor of productivity measurement, it shouldn’t be viewed in isolation from the other metrics when quantifying overall developer productivity.?
The above are some examples of relevant metrics to consider when analysing the impact of GitHub Copilot on delivery productivity.? The key is to take a balanced set of metrics that holistically considers software delivery a complex process.???
?
Combining the balanced scorecard of metrics to create a business case for GitHub Copilot?
Typically, we would combine data from the ‘balanced scorecard’ of metrics discussed above using simple weightings to create an overall Productivity Impact Assessment (PIA) of GitHub Copilot.? See the below table:??
GitHub CoPilot Productivity Impact Assessment - example template?
?The weighted average productivity improvement calculated in the PIA can then be applied to the estimated cost of the delivery capability (headcount x fully loaded staff costs).? This provides a productivity improvement monetary calculation based on resource costs.? It excludes the potentially (larger) benefits of delivering more value to customers earlier, which is not a benefit that is easily or necessarily calculated.???
Productivity improvements from using GitHub Copilot – the empirical data?
There is a distinct lack of independent data in this regard.?
GitHub’s own survey of 2,000 developers showed that 88% of developers claimed ‘to be more productive’ when using the tool, while a task test undertaken by 95 developers saw that the group that used GitHub Copilot was 55% faster and had a 7% higher rate of completing the task (see below).?
GitHub’s own Survey data – the impact of GitHub Copilot on users (2022)?
Our own analyses show improvements using a PIA (as shown above) of circa 5%. However, this is bound to improve further as AI technology improves so rapidly.?
?
About Plandek?
Plandek is an intelligent analytics platform to help software delivery teams deliver valuable software faster and more predictably. ??
Plandek enables technology teams to track and drive their improvement and share understandable KPIs with stakeholders interested in accelerating value creation/improving delivery efficiency. As such, Plandek is a key global vendor in the fastest-growing area of DevOps, known as Value Stream Management.?
Plandek works by mining data from delivery teams’ toolsets (such as issue tracking, code repos and CI/CD tools) to provide actionable and intelligent insight across the end-to-end software delivery process for users throughout the delivery team - from Team Lead to the CIO. ??
Plandek is recognised as a top global vendor by Gartner and Forrester and is used by private and public organisations globally to optimise their technology delivery.
For more information, please visit?www.plandek.com
We have recently written a new article about measuring the impact of GenAI on engineering productivity and the software delivery process: https://bit.ly/3SSdcJb
Principal Software Eng. | Book Author/Reviewer| Technical Blogger | Expert in Generative AI Testing Solutions,GitHub Copilot | Test Architect, iOS & Android Automation:Appium, Selenium, Accelq, LoadRunner, JMeter
1 å¹´Nice Informative article Plandek. Some of the Top 25 key productivity metrics and details are captured below for reference https://www.dhirubhai.net/pulse/mastering-github-copilot-top-25-metrics-redefining-developer-dixitt-cr3rc/
Engineering Head working at TATA Consultancy Services
1 å¹´Good article. Thank you for publishing
Thanks for the interesting article. Few questions though, about things without which, this article lacks 'meat': 1. GitHub states how many people were part of the test. In your case, how many people were working with/without copilot enabled? 2. How long did the test take? 3. In the test performed by GitHub, individuals were working on same task. How was it your case? Same tasks or different activities? This can affect results 4. GitHub was measuring only one part of the SDLC - coding. Do I understand correctly that you took a holistic approach of looking at results of full scrum sprints (including meetings, requirement discovery, builds, tests, deployments etc.)? Would be great to understand that do draw any conclussions. Also I believe that GitHub in a way 'lures' us into 'how great Copilot is', as greenfield like activities will greatly benefit from Copilot. While activities that require understanding a complex software setup and 'environment' of the places we introduce changes, are more sophisticated and will not be that much impacted by tools like Copilot. Would be interesting to know your thoughts on this one.