Reducing Code Review Time at Google
This is the latest issue of my newsletter. Each week I share research and perspectives on developer productivity. Subscribe to get future issues.
This week I read Resolving Code Review Comments with Machine Learning from Google. Code reviews are a critical part of the software development process at Google, but they take a significant amount of time. Researchers looked for a way to speed up code reviews while maintaining quality. This paper documents their solution and results.
My summary of the paper
Developers at Google spend a lot of time "shepherding" code changes through reviews, both as authors and reviewers. Even when only a single review iteration is needed there’s a cost involved—it takes time to understand the reviewer’s recommendation, look up relevant information, and type out the edit. Moreover, the required active work time that the code author must devote to address reviewer comments grows almost linearly with the number of comments.?
For these reasons, Google created a code review comment-resolution assistant. Their goal: to reduce the time spent resolving code review comments by making it easier for reviewers to provide actionable suggestions and authors to efficiently address those suggestions. Their assistant achieves this by proposing code changes based on a comment’s text.
Google's code review comment-resolution assistant uses machine learning to help developers address review comments more efficiently. Here's a simplified explanation of how it was created, how it works, and its impact:?
Modeling and training
The comment-resolution assistant uses a text-to-text machine learning model. It processes code with inline reviewer comments and predicts the necessary code edits to address these comments. The model was trained on a vast dataset, including over 3 billion examples, to handle various software-engineering tasks like code edits and error fixing. During training, it was fine-tuned to prioritize high-precision predictions, ensuring that only the most reliable suggestions are presented to users.
Prototyping an assistant based on the model
The team wanted the assistant to be easy and efficient for developers to use, so they tested different designs through user studies and an internal beta test. They ultimately developed an assistant that works as follows (also illustrated in the image below):?
Deploying and refining the system
Before rolling out the system, the research team conducted several iterations of refinement by testing the model on a separate set of data to see how well it predicted correct edits.?
Then, the beta tool was deployed to a small group of “friendly” users, where it was refined further through user feedback metrics. Specifically, researchers measured the number of comments produced in a day, the number of predictions the model made, the number those predictions that were previewed, and how many of those were applied or received a thumbs up/thumbs down. ?
The tool was then deployed to 50% of Google’s developer population, refined further, and finally to the full 100% of the population.?
领英推荐
Throughout this process, the research team made several important refinements to the model and system that improved performance and usability. For example, a seemingly small change to the way the suggest edits were shown to developers (a wording tweak and visual change) improved the percentage of edits previewed by developers from 20% up to 30%.?
Evaluating the assistant’s impact
The ultimate goal for the tool was to increase productivity. Google used quantitative metrics and qualitative feedback to measure the system’s impact. As for quantitative metrics, the team chose to track the following:?
After several months of deployment, the tool was addressing roughly 7.5% of comments produced by code reviewers in their day-to-day work. Considering tens of millions of code-review comments are left by Google developers every year, over 7% ML-assisted comment resolution is a considerable contribution to the company’s total engineering productivity.
Additionally, around half of all eligible comments received predictions. Of those predictions, over 63% were accepted by the reviewer and attached to the comment to be sent to the author. 34% of those suggested edits were previewed by the author. Of those previewed, 70% were accepted and applied to code.?
Qualitatively, the research team received positive feedback from developers in internal message boards, who called the assistant's suggestions "sorcery," "magic," and "impressive." For example, reviewers often found that the assistant could suggest the right changes even before they finished typing their comments. This saved time and made the review process more efficient for both reviewers and authors.
Final thoughts
I recently shared Meta’s experiment to reduce code review times, which they achieved by targeting the slowest 25% of code reviews. This study provides another example of a company making targeted improvements to the code review process as a path for improving developer productivity.
Who’s hiring right now
Here is a roundup of recent Developer Experience job openings. Find more open roles here .
That’s it for this week. Thanks for reading.
-Abi
Software Engineer
5 个月For some reason I'm not able to download the paper :/
Thanks for sharing this Abi! AI-powered code review is a promising innovation from Google with the potential to significantly boost developer productivity. Dev teams typically dedicate a significant portion of their time, often between 10-30%, to code reviews, utilizing AI to assist with this process can free up valuable developer resources. An automated AI code review tool could be particularly beneficial compared to auto code generators like Copilot. While Copilot offers productivity gains, some organizations are hesitant due to potential intellectual property (IP) concerns related to the training data used in some AI models. Widespread adoption and generalizability are key for maximizing the impact of this technology.
Stay locked in ???
5 个月???
Building AI SRE Agents to fix on call and incident response
5 个月Very interesting! This is also a great example of leveraging AI to streamline development processes. Implementing AI-powered tools for code reviews not only saves time but also enhances code quality and consistency.