登录查看更多内容

The impact of AI tooling on engineering at ANZ Bank

Abi Noda

Co-Founder, CEO at DX, Developer Intelligence Platform

发布日期: 2024年4月19日

This is the latest issue of my newsletter. Each week I share research and perspectives on developer productivity. Subscribe here to get future issues.

This week I read The Impact of AI Tooling on Engineering at ANZ Bank by members of ANZ’s Architecture & Engineering organizations. ANZ was interested in the potential productivity gains of using GitHub Copilot, so they conducted an experiment with a smaller group of engineers to help determine whether it should be rolled out to the broader organization. This paper describes the experiment’s setup and results.?

My summary of the paper

To evaluate whether Copilot should be used org-wide, the authors of this paper conducted an experiment for six weeks, and compared the tool’s impact on a test group versus a control group. They based their evaluation of the tool’s impact using measures for productivity, quality, and security.?

Experiment design

The experiment included two weeks of preparation and four weeks of actual testing. Here’s how the experiment was designed:

Prior to starting the experiment, risks related to intellectual property, data security, and privacy were assessed in conjunction with ANZ’s legal and security teams to arrive at a set of guidelines.?
The experiment started with by getting a baseline on productivity, quality, and security measures.?
About 100 engineers participated. Participants were randomly divided into either a control group, where the Copilot extension was disabled, or a Copilot group. These groups were reversed in week 3.?
Each week, participants were required to solve six algorithmic coding challenges.?
All engineers were asked to use only Python to code, for uniformity in assessing code quality and interpretation of code correctness, ensuring rigor in statistical evaluation.
Upon completing each task, participants would upload their solutions to their repositories and complete a survey.?
ANZ’s experiment team paid attention to the following metrics: "Productivity” was measured as the total time spent solving a problem (in minutes, self-reported). “Quality” was determined by measuring the unit test success ratio, number of bugs, and code smells. “Code Security” was measured with the number of vulnerabilities.
This data was collected from Copilot, surveys, and static code analysis, and the solutions submitted by participants were manually graded by the ANZ Copilot Experiment Team according to how well they completed the assigned tasks.

The experiment demonstrated that Copilot significantly reduced the time engineers take to complete tasks and positively influenced their ability to perform specific functions. However, the research team found no statistically significant improvements in code quality or security as a result of using the tool.

Here’s a closer look at the results:

Impact on speed

Throughout the experiment, participants recorded the time they took to complete each challenge. This data allowed the research team to calculate and compare the average time spent on tasks by both the Copilot group and the control group.

The findings were notable: the group using Copilot completed their tasks 42.36% faster than the control group. Specifically, the control group took an average of 30.98 minutes per task, while the Copilot group averaged 17.86 minutes.

When we look closer at the impact for engineers with different levels of Python proficiency, we can see that Copilot was beneficial for participants of all skill levels, but it was most helpful for those who were ‘Expert’ python programmers.?

This is intriguing because it conflicts with a GitHub study which found that developers with less programming experience benefited the most from Copilot. The GitHub study also measured task completion time but did not restrict participants to a specific programming language, unlike the ANZ study. It’s possible that the ‘expert’ python programmers at ANZ were more effective at using Copilot, however this is not certain.?

Participants also reported the difficulty of each task. We can see that Copilot gave the largest improvement when completing ‘Hard’ tasks. This observation makes sense: harder tasks have more opportunities where AI-assisted tools can help.?

As for measures of quality and security, the Copilot group had a 12.86% higher unit test success ratio, however this result was not statistically significant. The experiment was also unable to generate meaningful data to measure code security—however, the data suggests that Copilot did not introduce any major security issues into the code.?

领英推荐

CTOs guide on AI-assisted development

Cloudflare 4 个月前

Engineering the Future: How AI is Shaping the Next Era…

Revelo 2 个月前

AI in Application Engineering: How Automation is…

Pronix Inc 1 个月前

Impact on developer experience

Across all areas, engineers responded positively regarding GitHub Copilot. They felt it helped them review and understand existing code, create documentation, and test their code. Additionally, they felt Copilot helped them spend less time debugging and reduced their overall development time. They also found the suggestions provided by Copilot to be somewhat helpful and generally in line with their project’s coding standards. It should be noted, however, that while sentiment was positive, it was moderate.?

Ultimately, the experiment provided clear results about Copilot’s impact on the speed and ease of completing tasks in engineering. The authors recommended its wider adoption, and by the time the paper was published, over 1,000 users had already integrated Copilot into their workflows.

Final thoughts

While the findings from this study are interesting, I’m mostly inspired by how the organization approached its adoption of Copilot. In its simplest form, they established a baseline, ran an A/B test, and selected a range of metrics to assess the tool’s impact. It’s a great example for organizations looking to evaluate the effectiveness of a tool and determine whether it should be adopted on a larger scale.?

Measure GenAI adoption and impact

We recently published a free guide on how to measure adoption and impact of AI tools like GitHub Copilot. You can get a copy of the guide here.

Who’s hiring right now

Here’s a roundup of new Developer Experience job openings:

ClickUp is hiring a Staff Infrastructure Engineer - DevX/EngProd | Remote (US)
dbt Labs is hiring a Senior Software Engineer - Release Engineering & Developer Experience | Remote (US)
Plaid is hiring a Product Manager - Developer Platform | San Francisco
Rocket Money is hiring a Team Leader - Engineering Performance & Developer Experience | Various cities or remote (US)
SEB Bank is hiring a Head of DSI Developer Experience | Stockholm

Find more DevEx job postings here.

That’s it for this week. Thanks for reading.

-Abi

Engineering Enablement

9,006 位关注者

Douglas Hellinger

Tech Lead | Cloud | DevSecOps

5 个月

Certainly some highlights in experiment design to learn from. I wonder what would ANZ experiment team recommend/do differently next time? Curious if the weekly python programming problems were on ANZs codebases? Were the problems solvable independently without consulting others for clarifications? 42% faster on well specified independent programming challenges cannot be generalised to 42% on engineering tasks. Engineering tasks rarely come with blank canvas and precise requirements.

Joseph Watson

Director Of Applications Development at IDEA Public Schools

7 个月

Our team has been conducting a similar evaluation of the impact of AI assisted development over the past year using simple metrics of lead time and cycle time as the measure. What we found is that as prompt quality improved over time so did the gains in velocity. The overall improvement was about a 38% gain in velocity which support the papers findings. Additionally, we discovered that the gain was higher the more senior the developer which in some ways may seem counterintuitive however it pointed to the need to develop a different prompting approach for junior developers that ensures they are still actively learning while benefiting from AI tools.

1 次回应

Mike Bakker

Organisatie Transformatie | Strategie Realisatie | Programma Versnelling

7 个月

Lisanne Beijk

Rod Molina

Digital Transformation Leader & Agile Delivery Expert. I help mission-driven organisations innovate

7 个月

Michiel Starrenburg Reginal Ram

Stephen Wilkinson

Senior Staff Engineer at Lendable

7 个月

Overall I think this study is very misleading and doesn't show that engineers can solve all tasks 42% quicker, it helps them solve those that cover a very small part of what they do that the models are already trained heavily on. I'd almost say this provides little value, if any, in the real world. Now that they have run the test, I assume to justify a rolling out across all teams, is the business now expecting 42% increased output from the engineering teams?

6 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

The impact of AI tooling on engineering at ANZ Bank

Abi Noda

Co-Founder, CEO at DX, Developer Intelligence Platform

My summary of the paper

Experiment design

Impact on speed

领英推荐

Impact on developer experience

Final thoughts

Engineering Enablement

9,006 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Generative AI in Software Engineering

Strategies for effective prompt engineering + other resources

Revolutionizing Software Engineering with LLMs

Generative AI in Software Engineering: Strategies Beyond Prompt Engineering

AI Redefining Software Engineering Roles

Prompt Engineering & Orchestration

Intelligent Engineering with AI

May 08, 2024

The Evolution of Software Engineering

My summary of the paper

Experiment design

Impact on speed

领英推荐

Impact on developer experience

Final thoughts

Engineering Enablement

9,006 位关注者

What causes 'bad days' for developers?

2024年11月26日

Structured rollout boosts Copilot adoption and satisfaction by 20%

2024年11月19日

Platform vs. DevEx teams: What’s the difference?

2024年11月5日

2024 DORA Report

2024年10月30日

What’s a good developer survey participation rate?

2024年10月25日

Why developers lose trust in AI tools

2024年10月18日

Copilot productivity gains at a Fintech company with 2K+ engineers

2024年10月11日

What three experiments tell us about Copilot’s impact on productivity

2024年9月27日

Pfizer’s Future of Development

2024年9月20日

How do developers want to use AI tools?

2024年8月9日

社区洞察

其他会员也浏览了

Generative AI in Software Engineering

Strategies for effective prompt engineering + other resources

Revolutionizing Software Engineering with LLMs

Generative AI in Software Engineering: Strategies Beyond Prompt Engineering

AI Redefining Software Engineering Roles

Prompt Engineering & Orchestration

Intelligent Engineering with AI

May 08, 2024

The Evolution of Software Engineering