Self Healing Code
Dylan Davis
Enterprise AI Expert | Making automation surprisingly simple | Sharing no-fluff automation tips that actually work
Rather read with your ears? Then, I've got you covered. Check out this podcast where two LLMs talk through this blog post - Spotify and Apple Podcast.
Imagine a future where software can fix itself - just like your body heals a cut. We're moving toward truly resilient systems, and it's closer than you might think.
The exciting part? We're already seeing the first signs of this happening.
When breakthrough technologies emerge, innovation tends to snowball - and that's exactly what we're seeing with generative AI (GenAI). In 2024, LLMs are already starting to automatically fix buggy and vulnerable code.?
I've discovered six ways teams tackle this challenge - two are already available products you can use today, while four are research projects. Here's what's cool: unlike a lot of AI research, these four research projects are pretty straightforward to try out yourself.?
Product
Research
How Auto-Patching Actually Works
I want to highlight three levels of detail on how these auto-patching systems work (our levels of inception). Our first level is an overview showing the general similarities among all the different approaches.
This diagram shows us the basic workflow.
Step 1: Everything begins with code - whether written by a developer or generated by an LLM. This code enters a CI/CD pipeline (that's Continuous Integration/Continuous Delivery for non-techies). The system then scans the code for vulnerabilities using static analysis. When it finds an issue, it gathers important contextual information about the problem.
Step 2: Each approach packages the vulnerability information a bit differently, but they all include three key pieces: the CWE ID, the problematic code, and its exact location. This information gets fed into our LLM to analyze. The most successful method I've found doesn't overwhelm the LLM with the entire codebase - instead, it zeroes in on just the relevant parts of the code. More on this later with LLMPatch.?
Step 3: The LLM analyzes the problem and suggests several possible fixes - typically between three and eight different solutions. This multiple-solution approach increases our odds of finding the right fix.
Step 4: Some systems, like Google's, take an extra verification step by automatically testing each suggested fix with a unit test to ensure it doesn't break anything. Then, one or more LLMs review each fix for correctness - though what counts as "correct" varies between systems.
This "LLM-as-judge" strategy is popular in many GenAI products because it improves quality without slowing things down. Here are two good resources to help you start…
Step 5: The system ranks the verified fixes and presents the best ones to the developer for final approval. These fixes appear as simple accept/reject options in their code repository - just like a suggestion in a Google Doc.
Now, let's zoom in on the most impressive approach I've found: LLMPatch.?
LLMPatch Flow
Why focus on LLMPatch? Two reasons: it's the most transparent system out there, and it's incredibly capable. Unlike other systems, LLMPatch thoroughly explains its approach and can fix vulnerabilities across multiple programming languages - even brand-new security threats (what we call "zero days"). Their success rate is remarkable - they fixed 7 out of 11 zero-day vulnerabilities during testing.
Here’s the high-level flow…
Step 1: While the basic process follows our earlier framework, LLMPatch takes a unique approach to gathering vulnerability data. The researchers started by combining two massive databases: PatchDB (with over 12,000 real-world fixes) and CVEFixes (with more than 4,000 C-language fixes). Though these databases focus on C, LLMPatch's approach works across many programming languages. They carefully selected 300 cases from this collection - 200 to guide the system with examples and 100 to test it.
Step 2: When the system identifies a vulnerability, it uses a special tool called Joern to analyze it. Joern creates a map (called a Program Dependence Graph or PDG) showing how different parts of the code connect to each other. This map helps the LLM focus on exactly the right pieces of code - not just where the vulnerability is, but all the related code that might be affected.
Step 3: The system bundles together four key pieces of information:?
This focused package helps the AI understand the problem without getting lost in irrelevant code.?
The LLM then analyzes this information to identify the underlying cause of the vulnerability. If it's unsure, there's a clever backup plan… another LLM steps in to provide a plain-English summary of what the vulnerable function is trying to do (Its LLMs all the way down).
Below is the prompt they shared in the paper.?
Step 4: Here's where LLMPatch gets really smart. After identifying the root cause, it searches through its library of 200 example cases to find similar root causes that were successfully fixed before. It collects up to eight of the most relevant examples, using them as a reference for fixing the current problem. For each new vulnerability, the system crafts a custom prompt that's specifically tailored to the problem at hand.
This custom prompt has three key parts:
The system then generates five different possible fixes.
Step 5: Each proposed fix goes through a rigorous review process. A panel of LLMs (including Gemini, Claude, GPT, and Llama) examines each solution. If any of these LLM reviewers approves a fix, it's sent to the human developer for final validation.?
The LLM reviewers check two crucial things:
Key Lessons for Building Auto-Patching Systems
After reading through all this auto-patching content, I’ve identified three principles for building effective auto-patching tools:
Looking ahead, we're witnessing the early stages of self-healing software becoming a reality. Each of these approaches represents another step toward truly resilient systems - and that future might be closer than we think.