Hype cycle: The real impact of AI code in software delivery outcomes

Hype cycle: The real impact of AI code in software delivery outcomes

Has any technology so quickly and thoroughly captured the attention of tech savvy (and tech-adjacent) industries as generative AI? Open any piece of software launched or updated in the past few years, and odds are you’ll find the sparkles emoji displayed prominently in the UI, teasing AI-powered infusions of productivity and delight.

Global investment in AI topped $200 billion in 2024 and is projected to reach as much as $1.3 trillion (more than the GDP of 155 world economies, from Saudi Arabia to Ireland to to Angola) by 2030. And this massive influx of capital isn't just reshaping software for end-users; it's transforming the tools developers use to create those applications.?

AI coding assistants, capable of generating complex and fully functional code from simple inputs, have completely upended the world of software development. Novices are becoming 10x architects overnight, making it possible for a single person to build the next billion-dollar tech unicorn with nothing more than an LLM and a few well-defined prompts. At least that’s what the hype says. But the tide has turned in recent months and skepticism has started to creep into the conversation.

In this edition of the Confident Commit newsletter, we’re looking at the impact of AI code assistants on software delivery metrics. Do Copilot and similar tools deliver on their promises to improve delivery speed and quality? Or do they fall short of the lofty expectations set by tech evangelists and marketing teams? We’ll explore how AI-powered developer tools work under the hood, survey some of the existing research, and then dive into our own data to help you separate the hype from the reality.

TL;DR

  • AI coding assistants promise to put a world of coding knowledge at a developer’s fingertips, making them faster, more productive, less prone to errors, and happier in their jobs.
  • Pipeline data paints a more nuanced picture: AI speeds up some coding tasks, but at the expense of more complexity, lower success rates, and longer recovery times.
  • Organizations adopting AI should be clear in their goals and commit to robust testing and human code reviews to offset potential risks.

What are AI coding assistants?

AI coding assistants are tools that use machine learning algorithms to support and speed up the development process. By offering context-aware suggestions, automating repetitive tasks, and even generating entire code snippets or functions, they can help developers write, optimize, and debug code more efficiently.

Early coding assistants, introduced in the late 1990s, were static analysis tools and basic auto-complete features integrated into IDEs. Though these tools brought massive quality-of-life improvements for developers, they lacked the ability to understand context or adapt to the developer's intent, limiting their scope to surface-level code suggestions and bug detection.

A major breakthrough came in 2017 with the development of transformer models, a type of neural network architecture first developed by Google. Transformers improved AI’s ability to process and generate natural language by allowing the model to map the relationships between words regardless of their position in a sequence. This development significantly improved the accuracy and depth of AI models, enabling them to better understand code structure and intent.

Transformer models allow AI code assistants to understand context and produce relevant code.

The transformer architecture paved the way for large language models (LLMs) like OpenAI's GPT (Generative Pre-trained Transformer) series, which have become foundational in the field of generative AI. In 2019, Tabnine leveraged the open-source GPT-2 model to create one of the first widely-adopted AI coding assistants, demonstrating the potential of large language models in software development. GPT-3, released in June 2020, marked a major leap forward, enabling the generation of coherent and contextually relevant text—and by extension, code—at an unprecedented scale.

In June 2021, OpenAI partnered with GitHub to introduce Copilot, a coding assistant powered by OpenAI's Codex model (a descendant of GPT-3 fine-tuned on a vast corpus of open source code). Copilot's ability to generate complex code structures, understand context, and provide real-time suggestions in popular IDEs set a new standard for AI-assisted coding. Its success sparked a proliferation of AI coding assistants across the tech industry, with numerous companies developing their own tools to address various aspects of software development.

With transformers at the core, modern AI coding assistants can now generate more sophisticated and contextually relevant code completions. They are no longer limited to syntax- or keyword-based suggestions but can analyze entire codebases and offer multi-line code completions, suggest optimized solutions, and even generate full functions.?

How AI coding assistants work

The models used in coding assistants are trained on vast amounts of open-source code from platforms like GitHub. This enables them to learn common patterns, structures, and practices (both good and bad) from millions of lines of code across various programming languages and frameworks.

AI coding assistants work by first breaking down a developer's input—whether it's code or natural language—into smaller parts called tokens. These tokens are then processed by a transformer model, which helps the AI understand the relationships between different parts of the input. By building context from both the immediate code and surrounding information, the AI can make sense of what the developer is aiming to accomplish.


Code assistants analyze input, understand context, and generate code suggestions using machine learning models.

To build on these capabilities, many AI coding tools use a technique called Retrieval-Augmented Generation (RAG), which allows the AI to search specialized knowledge sources, like internal code libraries or documentation, to provide more accurate and relevant suggestions. Combining RAG with the AI's built-in language model, the assistant can generate useful code snippets or recommendations in real time.

Many code assistants are available directly in the developer’s integrated development environment (IDE), allowing them to trigger, review, and accept code suggestions without interrupting their normal coding flow. These tools are often equipped with feedback mechanisms that help the AI refine its performance and tailor its recommendations to better suit the developer's coding style and project needs over time.

What value do AI code assistants provide?

AI code assistants emerged with grand promises: boosting productivity, reducing errors, and democratizing software development by accelerating learning and making advanced techniques accessible to novices. Companies like GitHub reported impressive stats, claiming Copilot users completed coding tasks 55% faster and that 97% of developers had used AI coding tools in their work by 2024. Industry figures even predicted the end of human-driven coding, with computer scientist Matt Welsh envisioning entire programs written by AI and Nvidia's CEO Jensen Huang foreseeing a future where programming becomes obsolete.

However, as AI tools transitioned from novelty to norm, independent research painted a more nuanced picture. A 2023 DevOps.com study found only a 5% productivity increase among Copilot users, while 2024 research from Princeton and MIT showed a 26% boost, but mainly for less experienced developers handling routine tasks.

Meanwhile, studies began to raise concerns about code quality and security. GitClear noted “downward pressure” on code quality driven by increased code churn and declining maintainability, while Uplevel found a 41% rise in bug rates alongside little to no increase in productivity.?

Security implications added another layer of complexity. Stanford researchers in 2023 found that developers using AI assistants were more likely to introduce vulnerabilities while overestimating their code's security. Other researchers sounded the alarm over “package hallucinations,” where AI generates non-existent software package references, opening new potential attack vectors.

These findings reveal a double-edged sword: AI coding tools can enhance productivity, especially for routine tasks and less experienced developers. But they also introduce risks in code quality, maintainability, and security that are causing some organizations to think twice about adopting AI tools.

Data exploration: What CI/CD pipeline metrics reveal

Many of the claims about the benefits of AI-generated code? — and their potential downsides — can be directly measured through the data generated by CI/CD platforms.?

A CI/CD pipeline automates the process of integrating, testing, and deploying code changes, offering real-time insights into metrics like throughput, build success rates, workflow durations, and recovery times. These metrics help answer the key question: are AI code generators worth the trade-offs they introduce? Our analysis leverages these data points to explore the real impact of AI on software development.

Methodology

To conduct our study, we collected data from 3,000 pipeline runs triggered by 105 organizations between July and September 2024. The organizations spanned 10 representative industries:

  • Automotive
  • Banking
  • Capital Markets
  • Computer Software
  • Consumer Services
  • Education Management
  • Financial Services
  • Hospital Health Care
  • Information Technology and Services
  • Telecommunications

Since we lack direct visibility into local development environments, it’s not possible to know for certain the extent to which AI is used in individual commits (a challenge faced across the industry and with far more significant implications than just the accuracy of this study). As a proxy, we fed anonymized commit messages from pipeline-triggering code changes into an LLM (GPT-4o) and asked it to look for signs of AI assistance, assigning a confidence score and explanation for each classification.

Commit messages used in this study were sourced entirely from organizations who had opted in to sending their data to an LLM for analysis. The messages were anonymized and not retained by the model or used in further model training.

To be sure, this was not a rigorous scientific process. The confidence scores (an average of 62% for AI-assisted commits and 69% for human-only commits) reflect its inherent uncertainty. But by making some reasonable assumptions about the likelihood of AI assistance, we can look into real pipeline results for obvious signals that either corroborate or challenge the prevailing wisdom around AI’s effect on the speed and quality of software development.

Key Findings

AI code usage across industries

AI usage varied widely by industry, with Automotive leading at 17% (though this result was skewed by the small number of organizations representing this vertical).?

Notably, all of these numbers fall well below the broader industry claims that 97% of developers use AI code generators and that 30% of AI-generated code suggestions are accepted into final commits. If that held true, we would expect 2,910 of the 3,000 commits we analyzed to have been made by developers who regularly use AI, and 30% of that grouping (873 commits, or 29.1% of the original 3,000) to have been AI assisted. Instead, our analysis found 217 commits, or 7.24%, to be AI generated.

Types of AI-generated commits

Analyzing the reasoning behind which commits were classified as AI-generated can be helpful in understanding how AI tools are being applied in real-world development and identifying usage patterns.

In our dataset, commits flagged as likely AI-generated were typically associated with four categories of work:

  1. Bootstrapping new features with boilerplate code: AI tools are often used to generate repetitive, foundational code that developers can then build on, saving time on setup and initialization tasks.
  2. Improving code quality: Tasks like fixing imports, improving logging, or refactoring were frequently AI-assisted, indicating that AI is well-suited for optimizing existing code and enforcing best practices.
  3. Automating or optimizing repetitive tasks through scripts: AI was effective in streamlining common tasks such as setting parameters, adding pagination, or generating utility functions, which are repetitive but essential for development.
  4. Implementing systematic changes to code structure: AI tools were also applied in tasks that involved broader, more structural code modifications, such as code cleanup or reorganization, though typically in predictable and well-defined scenarios.

That the suspected AI-generated commits fell into these categories supports current research showing AI’s strength in automating routine processes but underscore its limitations in handling more complex or nuanced work.

Pipeline performance: AI vs. human-generated code

Now that we have an idea of who is using AI code and for what purposes, can we identify any obvious impacts on code velocity or quality?

On the positive side, AI-assisted commits ran significantly faster, with an average pipeline duration of 20 minutes, compared to 53 minutes for human-written commits.

There are two possible hypotheses for this speed difference:

  1. Organizations that rely heavily on AI-generated code may have a higher tolerance for risk and prioritize speed over thoroughness, running fewer or less rigorous tests in their pipelines. This approach could result in faster execution times, as fewer checks and dependency verifications are performed, potentially increasing the risk of errors down the line.
  2. Organizations comfortable adopting AI already have robust, highly tuned pipelines in place, with tests specifically optimized for both speed and security. These organizations likely employ advanced CI/CD practices, allowing them to quickly process AI-assisted commits without sacrificing the quality or safety of their code, contributing to faster pipeline execution.

It’s also possible that organizations could be combining these two approaches, limiting AI usage to non-production branches of their codebase, where errors are less costly and thus tend to have fewer or less stringent tests.?

At the same time, AI-assisted commits in our dataset had a lower success rate—just 33% compared to 53% for human-generated code.

This indicates that while AI speeds up coding tasks, it also leads to more frequent failures, possibly due to limitations in handling complex or nuanced scenarios. AI-generated code may excel at automating routine, well-defined tasks, but it often lacks the deeper contextual understanding needed to address edge cases, integration points, or intricate logic. As a result, these commits may require more debugging, additional testing, or human intervention to bring the code up to production standards.

Broader performance trends: Comparing pre- and post-ChatGPT

We’ve seen so far that organizations appear to be making some modest gains in productivity using AI coding tools. We’ve compared AI-generated commits to human-only commits and observed that while AI tools are driving improvements in pipeline speed, they come with trade-offs in terms of success rates.?

To provide further context, we also analyzed performance trends across our entire customer base from the year before and after the launch of ChatGPT in November 2022. This allowed us to identify any significant shifts in software delivery metrics that might be attributed to the broader adoption of AI tools.?

Perhaps the most notable change after ChatGPT’s release was the significant increase in average throughput, which rose from 29 workflows per day before November 2022 to 89 per day in the year that followed.

Can all of this increase be attributed to AI tool adoption? It’s unlikely given the many factors at play, from changes in the makeup of our customer base to improvements in delivery practices at the individual org level. But directionally speaking, there is a clear trend toward higher levels of output in the immediate aftermath of AI tools becoming broadly available to software teams. We noted a similar uptick of productivity in the 2024 State of Software Delivery, attributing it to factors ranging from the rise of platform engineering to increased usage of AI tools.

Another notable trend from that time period is the marked increase in average recovery times from failed pipelines.

Taken with our previous finding that AI-generated commits were more likely to contain bugs or errors, this suggests that while AI tools may accelerate initial development, they are also likely to introduce complexities that lead to longer recovery and troubleshooting processes. The increased speed in pushing commits could be offset by the need for more frequent debugging and rework, highlighting the importance of balancing AI-driven productivity gains with robust testing and quality assurance practices.

Conclusion

Data from CI/CD pipelines largely aligns with results from other researchers in the field: AI tools increase developer output and shorten completion times, but they often carry hidden costs. While we observed significant improvements in throughput and pipeline run times, we also noted lower success rates and longer recovery times for AI-generated code.

Do these findings spell doom for the AI coding industry? Almost certainly not. It’s still early days for AI coding assistants, and the numbers so far don’t live up to the hype. But that doesn’t mean you should dismiss AI tools outright. Using AI code assistants in the right scenarios and with the right expectations can still bring your team value. And in doing so, you’ll contribute to the collective intelligence shaping these models.

Early AI tools were trained on unvetted open-source code, so it's inevitable that some bad or overly simplistic solutions will surface. But now, as developers accept or reject suggestions and run tests on the results, a valuable feedback loop is forming. Over time, with more real-world iterations, these models will improve. Those on the front lines, generating and refining code, are helping push AI tools to do better with each iteration.

Responsibly incorporating AI into your development process is possible. It just requires a bit of careful planning and vigilance:?

  1. Start by clearly defining how AI tools will enhance your workflow, rather than disrupt it.
  2. Implement strong testing and continuous integration practices to catch potential issues before they become larger problems.
  3. Enforce human oversight through rigorous code reviews and pairing AI suggestions with experienced developer judgment.
  4. Invest in comprehensive onboarding and training to ensure teams can use AI effectively and understand its limitations.
  5. Regularly monitor and evaluate the impact of AI tools on code quality, security, and overall project health.

By putting these guardrails in place, teams of all sizes can tap into AI’s speed and efficiency while keeping risks to code quality and security in check. Software development is moving toward a blend of AI and human effort, where AI enhances our creativity and skills instead of taking over entirely. Equip yourself with the right tools and strategies to make the most of AI safely and effectively in this new era.

Further Reading

Risks and rewards of generative AI for software development

AI adoption for software: a guide to learning, tool selection, and delivery

CircleCI for AI/ML workflows

要查看或添加评论,请登录

CircleCI的更多文章

社区洞察

其他会员也浏览了