登录查看更多内容

Reducing Code Review Time at Google

Abi Noda

Co-Founder, CEO at DX, Developer Intelligence Platform

发布日期: 2024年5月31日

This is the latest issue of my newsletter. Each week I share research and perspectives on developer productivity. Subscribe to get future issues.

This week I read Resolving Code Review Comments with Machine Learning from Google. Code reviews are a critical part of the software development process at Google, but they take a significant amount of time. Researchers looked for a way to speed up code reviews while maintaining quality. This paper documents their solution and results.

My summary of the paper

Developers at Google spend a lot of time "shepherding" code changes through reviews, both as authors and reviewers. Even when only a single review iteration is needed there’s a cost involved—it takes time to understand the reviewer’s recommendation, look up relevant information, and type out the edit. Moreover, the required active work time that the code author must devote to address reviewer comments grows almost linearly with the number of comments.?

For these reasons, Google created a code review comment-resolution assistant. Their goal: to reduce the time spent resolving code review comments by making it easier for reviewers to provide actionable suggestions and authors to efficiently address those suggestions. Their assistant achieves this by proposing code changes based on a comment’s text.

Google's code review comment-resolution assistant uses machine learning to help developers address review comments more efficiently. Here's a simplified explanation of how it was created, how it works, and its impact:?

Modeling and training

The comment-resolution assistant uses a text-to-text machine learning model. It processes code with inline reviewer comments and predicts the necessary code edits to address these comments. The model was trained on a vast dataset, including over 3 billion examples, to handle various software-engineering tasks like code edits and error fixing. During training, it was fine-tuned to prioritize high-precision predictions, ensuring that only the most reliable suggestions are presented to users.

Prototyping an assistant based on the model

The team wanted the assistant to be easy and efficient for developers to use, so they tested different designs through user studies and an internal beta test. They ultimately developed an assistant that works as follows (also illustrated in the image below):?

Incoming comments: It listens for new code-review comments from reviewers.?
Eligible for ML fixing: It ignores irrelevant comments, such as those from automated tools, non-specific comments, comments on unsupported file types, resolved comments, and comments with manual suggestions.
Generated ML predictions: It queries the model to generate a suggested code edit.
If the model is confident in the prediction (above 70% precision), it posts the suggestion to downstream systems (the code-review frontend and the integrated development environment).?
Discovered and Applied: There, the suggested edits are exposed to the user. The system also logs user interactions, such as whether they preview the suggested edit and whether they accept it.?

Deploying and refining the system

Before rolling out the system, the research team conducted several iterations of refinement by testing the model on a separate set of data to see how well it predicted correct edits.?

Then, the beta tool was deployed to a small group of “friendly” users, where it was refined further through user feedback metrics. Specifically, researchers measured the number of comments produced in a day, the number of predictions the model made, the number those predictions that were previewed, and how many of those were applied or received a thumbs up/thumbs down. ?

The tool was then deployed to 50% of Google’s developer population, refined further, and finally to the full 100% of the population.?

领英推荐

Microsoft Build 2024 recap

Microsoft Developer 4 个月前

Tutorial: Build Any App in Minutes with GPTEngineer…

Cohen Reuven 2 个月前

Replit Agents: Cursor Who?

AIM Events 2 个月前

Throughout this process, the research team made several important refinements to the model and system that improved performance and usability. For example, a seemingly small change to the way the suggest edits were shown to developers (a wording tweak and visual change) improved the percentage of edits previewed by developers from 20% up to 30%.?

Evaluating the assistant’s impact

The ultimate goal for the tool was to increase productivity. Google used quantitative metrics and qualitative feedback to measure the system’s impact. As for quantitative metrics, the team chose to track the following:?

Acceptance rate by author: The fraction of all code-review comments that are resolved by the assistant. This measures, out of all (non-automated) comments left by human reviewers, what fraction received an ML-suggested edit that the author accepted and applied directly to their changelist.?
Prediction coverage: This measures the percentage of comments that receive a prediction.?
Acceptance rate by reviewer: Similarly, the team measured the percentage of predictions accepted by reviewers.?

After several months of deployment, the tool was addressing roughly 7.5% of comments produced by code reviewers in their day-to-day work. Considering tens of millions of code-review comments are left by Google developers every year, over 7% ML-assisted comment resolution is a considerable contribution to the company’s total engineering productivity.

Additionally, around half of all eligible comments received predictions. Of those predictions, over 63% were accepted by the reviewer and attached to the comment to be sent to the author. 34% of those suggested edits were previewed by the author. Of those previewed, 70% were accepted and applied to code.?

Qualitatively, the research team received positive feedback from developers in internal message boards, who called the assistant's suggestions "sorcery," "magic," and "impressive." For example, reviewers often found that the assistant could suggest the right changes even before they finished typing their comments. This saved time and made the review process more efficient for both reviewers and authors.

Final thoughts

I recently shared Meta’s experiment to reduce code review times, which they achieved by targeting the slowest 25% of code reviews. This study provides another example of a company making targeted improvements to the code review process as a path for improving developer productivity.

Who’s hiring right now

Here is a roundup of recent Developer Experience job openings. Find more open roles here .

Airbnb is hiring a Senior Staff Engineer (AI) - Developer Productivity | US
Betterment is hiring a Staff Technical Program Manager | New York
Snyk is hiring a VP, Engineering - Developer Experience | Boston, London
Webflow is hiring an Engineering Manager - Developer Productivity | US

That’s it for this week. Thanks for reading.

-Abi

Engineering Enablement

8,911 位关注者

Hamid Davoodi

Software Engineer

5 个月

For some reason I'm not able to download the paper :/

Sarma V Appala

5 个月

Thanks for sharing this Abi! AI-powered code review is a promising innovation from Google with the potential to significantly boost developer productivity. Dev teams typically dedicate a significant portion of their time, often between 10-30%, to code reviews, utilizing AI to assist with this process can free up valuable developer resources. An automated AI code review tool could be particularly beneficial compared to auto code generators like Copilot. While Copilot offers productivity gains, some organizations are hesitant due to potential intellectual property (IP) concerns related to the training data used in some AI models. Widespread adoption and generalizability are key for maximizing the impact of this technology.

Marc H. Guirand

Stay locked in ???

5 个月

???

Henry Hund

Building AI SRE Agents to fix on call and incident response

5 个月

Very interesting! This is also a great example of leveraging AI to streamline development processes. Implementing AI-powered tools for code reviews not only saves time but also enhances code quality and consistency.

查看更多评论

要查看或添加评论，请登录

查看全部

Reducing Code Review Time at Google

Abi Noda

Co-Founder, CEO at DX, Developer Intelligence Platform

My summary of the paper

Modeling and training

Prototyping an assistant based on the model

Deploying and refining the system

领英推荐

Evaluating the assistant’s impact

Final thoughts

Who’s hiring right now

Engineering Enablement

8,911 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Unveiling the Beego Buzz: Why Developers Are Loving It?

Key Trends For Full Stack Development to Follow in 2024 Introduction:

Building Eazy Develop : Hands-On System Design

The citizen developers are coming. But what’s still needed for GenAI to disrupt no-code tech? (part 2)

Flask vs. FastAPI: Which Should You Choose? ??

How to Use Offline LLMs to Accelerate Code Development

Leveraging GenAI: Code Optimization Made Easy

Backend 101?-?Designing Industry Standard REST?APIs

Amazon CodeWhisperer: Your FAQs answered

My Custom GPTs

My summary of the paper

Modeling and training

Prototyping an assistant based on the model

Deploying and refining the system

领英推荐

Evaluating the assistant’s impact

Final thoughts

Who’s hiring right now

Engineering Enablement

8,911 位关注者

Platform vs. DevEx teams: What’s the difference?

2024年11月5日

2024 DORA Report

2024年10月30日

What’s a good developer survey participation rate?

2024年10月25日

Why developers lose trust in AI tools

2024年10月18日

Copilot productivity gains at a Fintech company with 2K+ engineers

2024年10月11日

What three experiments tell us about Copilot’s impact on productivity

2024年9月27日

Pfizer’s Future of Development

2024年9月20日

How do developers want to use AI tools?

2024年8月9日

Cognitive load drivers

2024年8月2日

How Spotify maintains team autonomy at scale

2024年7月26日

社区洞察

其他会员也浏览了

Unveiling the Beego Buzz: Why Developers Are Loving It?

Key Trends For Full Stack Development to Follow in 2024 Introduction:

Building Eazy Develop : Hands-On System Design

The citizen developers are coming. But what’s still needed for GenAI to disrupt no-code tech? (part 2)

Flask vs. FastAPI: Which Should You Choose? ??

How to Use Offline LLMs to Accelerate Code Development

Leveraging GenAI: Code Optimization Made Easy

Backend 101?-?Designing Industry Standard REST?APIs

Amazon CodeWhisperer: Your FAQs answered

My Custom GPTs