Self Healing Code

Self Healing Code

Rather read with your ears? Then, I've got you covered. Check out this podcast where two LLMs talk through this blog post - Spotify and Apple Podcast.


Imagine a future where software can fix itself - just like your body heals a cut. We're moving toward truly resilient systems, and it's closer than you might think.

The exciting part? We're already seeing the first signs of this happening.

When breakthrough technologies emerge, innovation tends to snowball - and that's exactly what we're seeing with generative AI (GenAI). In 2024, LLMs are already starting to automatically fix buggy and vulnerable code.?

I've discovered six ways teams tackle this challenge - two are already available products you can use today, while four are research projects. Here's what's cool: unlike a lot of AI research, these four research projects are pretty straightforward to try out yourself.?


Product

Research

How Auto-Patching Actually Works

I want to highlight three levels of detail on how these auto-patching systems work (our levels of inception). Our first level is an overview showing the general similarities among all the different approaches.

This diagram shows us the basic workflow.


Step 1: Everything begins with code - whether written by a developer or generated by an LLM. This code enters a CI/CD pipeline (that's Continuous Integration/Continuous Delivery for non-techies). The system then scans the code for vulnerabilities using static analysis. When it finds an issue, it gathers important contextual information about the problem.

Step 2: Each approach packages the vulnerability information a bit differently, but they all include three key pieces: the CWE ID, the problematic code, and its exact location. This information gets fed into our LLM to analyze. The most successful method I've found doesn't overwhelm the LLM with the entire codebase - instead, it zeroes in on just the relevant parts of the code. More on this later with LLMPatch.?

Step 3: The LLM analyzes the problem and suggests several possible fixes - typically between three and eight different solutions. This multiple-solution approach increases our odds of finding the right fix.

Step 4: Some systems, like Google's, take an extra verification step by automatically testing each suggested fix with a unit test to ensure it doesn't break anything. Then, one or more LLMs review each fix for correctness - though what counts as "correct" varies between systems.

This "LLM-as-judge" strategy is popular in many GenAI products because it improves quality without slowing things down. Here are two good resources to help you start…

  1. Creating a LLM-as-a-Judge That Drives Business Results
  2. Evaluating the Effectiveness of LLM-Evaluators (aka LLM-as-Judge)

Step 5: The system ranks the verified fixes and presents the best ones to the developer for final approval. These fixes appear as simple accept/reject options in their code repository - just like a suggestion in a Google Doc.

Now, let's zoom in on the most impressive approach I've found: LLMPatch.?

LLMPatch Flow

Why focus on LLMPatch? Two reasons: it's the most transparent system out there, and it's incredibly capable. Unlike other systems, LLMPatch thoroughly explains its approach and can fix vulnerabilities across multiple programming languages - even brand-new security threats (what we call "zero days"). Their success rate is remarkable - they fixed 7 out of 11 zero-day vulnerabilities during testing.

Here’s the high-level flow…


Step 1: While the basic process follows our earlier framework, LLMPatch takes a unique approach to gathering vulnerability data. The researchers started by combining two massive databases: PatchDB (with over 12,000 real-world fixes) and CVEFixes (with more than 4,000 C-language fixes). Though these databases focus on C, LLMPatch's approach works across many programming languages. They carefully selected 300 cases from this collection - 200 to guide the system with examples and 100 to test it.

Step 2: When the system identifies a vulnerability, it uses a special tool called Joern to analyze it. Joern creates a map (called a Program Dependence Graph or PDG) showing how different parts of the code connect to each other. This map helps the LLM focus on exactly the right pieces of code - not just where the vulnerability is, but all the related code that might be affected.


Step 3: The system bundles together four key pieces of information:?

  • The vulnerability ID (CWE)
  • The problematic code
  • The exact location of the issue
  • Related code pieces identified by our PDG map

This focused package helps the AI understand the problem without getting lost in irrelevant code.?

The LLM then analyzes this information to identify the underlying cause of the vulnerability. If it's unsure, there's a clever backup plan… another LLM steps in to provide a plain-English summary of what the vulnerable function is trying to do (Its LLMs all the way down).

Below is the prompt they shared in the paper.?


Step 4: Here's where LLMPatch gets really smart. After identifying the root cause, it searches through its library of 200 example cases to find similar root causes that were successfully fixed before. It collects up to eight of the most relevant examples, using them as a reference for fixing the current problem. For each new vulnerability, the system crafts a custom prompt that's specifically tailored to the problem at hand.

This custom prompt has three key parts:

  • Historical Examples: Up to 8 similar root causes and how they were fixed
  • Prompt Question: Clear instructions for what needs to be fixed
  • Response Structure: A pre-built structure that includes the root cause we found and suggested fix strategies?

The system then generates five different possible fixes.


Step 5: Each proposed fix goes through a rigorous review process. A panel of LLMs (including Gemini, Claude, GPT, and Llama) examines each solution. If any of these LLM reviewers approves a fix, it's sent to the human developer for final validation.?

The LLM reviewers check two crucial things:

  1. Valid: Does it preserve the original code's functionality?
  2. Correct: Does the fix actually solve the security problem?


Key Lessons for Building Auto-Patching Systems

After reading through all this auto-patching content, I’ve identified three principles for building effective auto-patching tools:

  1. Focus the AI's Attention: Just like humans, AI works better when it can concentrate on relevant information. Don’t throw all the code at a model, hoping a quality patch is generated. Instead, narrow the scope (i.e., focusing on dependent code with our PDG graph) to help create more effective patches more consistently.
  2. Traditional Tools: Don't throw out traditional security tools. In every case, we rely on static analysis to discover the vulnerability, test the code with a unit test, and automate the process with quality CI/CD pipelines. As we can see, LLMs assist with automating this even further, but they are not replacing everything (...yet).??
  3. Dynamic Prompting: Systems that automatically adjust their approach based on each specific vulnerability perform significantly better. LLMPatch does a great job with this by creating a series of automated steps before the final patch generation prompt, such as extracting relevant code based on the PDG, finding the root cause, comparing root cause examples, and bundling this together for the final prompt. If you check Table 2 on page 11 of their paper, you’ll see just how important each step is for effectiveness.??

Looking ahead, we're witnessing the early stages of self-healing software becoming a reality. Each of these approaches represents another step toward truly resilient systems - and that future might be closer than we think.


要查看或添加评论,请登录

Dylan Davis的更多文章

  • TLDR: Lessons from 1 year of building with LLMs

    TLDR: Lessons from 1 year of building with LLMs

    This post is best consumed via video. I highly recommend taking the time to watch me chat about each one of these…

  • Three GPTs Walk into a Bar and Write an Exec Summary

    Three GPTs Walk into a Bar and Write an Exec Summary

    Don’t want to read? Then listen on Spotify or Apple podcast Today I want to share a process around three Custom GPTs…

    2 条评论
  • Life Lessons I Wish I Knew at 17: A Letter to My Niece

    Life Lessons I Wish I Knew at 17: A Letter to My Niece

    Dear Z, You're growing up quickly, and there are many important decisions ahead in the next few years. I realize I…

    7 条评论
  • OpenAI's 2024 Decline: A Value Prediction

    OpenAI's 2024 Decline: A Value Prediction

    This week's post is inspired by a prediction Chamath made in a recent All-in podcast. He predicts a dip in OpenAI's…

  • How do you keep up with AI?

    How do you keep up with AI?

    People often ask me, "How do you keep up with AI?" In this post, I aim to answer that. Here are the resources I…

    1 条评论
  • Disappearing data moats

    Disappearing data moats

    Are companies overvaluing their data moats in our new world of AI? A deeper exploration into synthetic data suggests…

  • Securing the New Era of AI-Driven Operating Systems: A Novice's Tale

    Securing the New Era of AI-Driven Operating Systems: A Novice's Tale

    Imagine a world where you’re interacting with a device (computer, phone, etc.) you no longer need to switch between…

  • GenAI's Shift: From Cyber Villain to LLM Protector

    GenAI's Shift: From Cyber Villain to LLM Protector

    Subscribe, so I can grace your inbox with my presence. ?? Don't want to read? Then listen to this post on Apple podcast…

  • Cyber Attacks (Session Hijacking)

    Cyber Attacks (Session Hijacking)

    The Attack Yesterday we explored cross-site scripting (XSS) and today this exact attack will lead us to our next…

  • Cyber Attacks (Injections)

    Cyber Attacks (Injections)

    What is it? Injection attacks are some of the oldest, most dangerous, and well-known attacks in the world of…

社区洞察

其他会员也浏览了