登录查看更多内容

Self Healing Code

Dylan Davis

Enterprise AI Expert | Making automation surprisingly simple | Sharing no-fluff automation tips that actually work

发布日期: 2024年11月3日

Rather read with your ears? Then, I've got you covered. Check out this podcast where two LLMs talk through this blog post - Spotify and Apple Podcast.

Imagine a future where software can fix itself - just like your body heals a cut. We're moving toward truly resilient systems, and it's closer than you might think.

The exciting part? We're already seeing the first signs of this happening.

When breakthrough technologies emerge, innovation tends to snowball - and that's exactly what we're seeing with generative AI (GenAI). In 2024, LLMs are already starting to automatically fix buggy and vulnerable code.?

I've discovered six ways teams tackle this challenge - two are already available products you can use today, while four are research projects. Here's what's cool: unlike a lot of AI research, these four research projects are pretty straightforward to try out yourself.?

Product

Research

How Auto-Patching Actually Works

I want to highlight three levels of detail on how these auto-patching systems work (our levels of inception). Our first level is an overview showing the general similarities among all the different approaches.

This diagram shows us the basic workflow.

Step 1: Everything begins with code - whether written by a developer or generated by an LLM. This code enters a CI/CD pipeline (that's Continuous Integration/Continuous Delivery for non-techies). The system then scans the code for vulnerabilities using static analysis. When it finds an issue, it gathers important contextual information about the problem.

Step 2: Each approach packages the vulnerability information a bit differently, but they all include three key pieces: the CWE ID, the problematic code, and its exact location. This information gets fed into our LLM to analyze. The most successful method I've found doesn't overwhelm the LLM with the entire codebase - instead, it zeroes in on just the relevant parts of the code. More on this later with LLMPatch.?

Step 3: The LLM analyzes the problem and suggests several possible fixes - typically between three and eight different solutions. This multiple-solution approach increases our odds of finding the right fix.

Step 4: Some systems, like Google's, take an extra verification step by automatically testing each suggested fix with a unit test to ensure it doesn't break anything. Then, one or more LLMs review each fix for correctness - though what counts as "correct" varies between systems.

This "LLM-as-judge" strategy is popular in many GenAI products because it improves quality without slowing things down. Here are two good resources to help you start…

Step 5: The system ranks the verified fixes and presents the best ones to the developer for final approval. These fixes appear as simple accept/reject options in their code repository - just like a suggestion in a Google Doc.

Now, let's zoom in on the most impressive approach I've found: LLMPatch.?

LLMPatch Flow

Why focus on LLMPatch? Two reasons: it's the most transparent system out there, and it's incredibly capable. Unlike other systems, LLMPatch thoroughly explains its approach and can fix vulnerabilities across multiple programming languages - even brand-new security threats (what we call "zero days"). Their success rate is remarkable - they fixed 7 out of 11 zero-day vulnerabilities during testing.

Here’s the high-level flow…

领英推荐

June 2024

Amazon Science 7 个月前

System 2 Computing: The AI Value Supernova

VAST Data 1 个月前

EchoUser Newsletter - December 2024

EchoUser 2 个月前

Step 1: While the basic process follows our earlier framework, LLMPatch takes a unique approach to gathering vulnerability data. The researchers started by combining two massive databases: PatchDB (with over 12,000 real-world fixes) and CVEFixes (with more than 4,000 C-language fixes). Though these databases focus on C, LLMPatch's approach works across many programming languages. They carefully selected 300 cases from this collection - 200 to guide the system with examples and 100 to test it.

Step 2: When the system identifies a vulnerability, it uses a special tool called Joern to analyze it. Joern creates a map (called a Program Dependence Graph or PDG) showing how different parts of the code connect to each other. This map helps the LLM focus on exactly the right pieces of code - not just where the vulnerability is, but all the related code that might be affected.

Step 3: The system bundles together four key pieces of information:?

The vulnerability ID (CWE)
The problematic code
The exact location of the issue
Related code pieces identified by our PDG map

This focused package helps the AI understand the problem without getting lost in irrelevant code.?

The LLM then analyzes this information to identify the underlying cause of the vulnerability. If it's unsure, there's a clever backup plan… another LLM steps in to provide a plain-English summary of what the vulnerable function is trying to do (Its LLMs all the way down).

Below is the prompt they shared in the paper.?

Step 4: Here's where LLMPatch gets really smart. After identifying the root cause, it searches through its library of 200 example cases to find similar root causes that were successfully fixed before. It collects up to eight of the most relevant examples, using them as a reference for fixing the current problem. For each new vulnerability, the system crafts a custom prompt that's specifically tailored to the problem at hand.

This custom prompt has three key parts:

Historical Examples: Up to 8 similar root causes and how they were fixed
Prompt Question: Clear instructions for what needs to be fixed
Response Structure: A pre-built structure that includes the root cause we found and suggested fix strategies?

The system then generates five different possible fixes.

Step 5: Each proposed fix goes through a rigorous review process. A panel of LLMs (including Gemini, Claude, GPT, and Llama) examines each solution. If any of these LLM reviewers approves a fix, it's sent to the human developer for final validation.?

The LLM reviewers check two crucial things:

Valid: Does it preserve the original code's functionality?
Correct: Does the fix actually solve the security problem?

Key Lessons for Building Auto-Patching Systems

After reading through all this auto-patching content, I’ve identified three principles for building effective auto-patching tools:

Focus the AI's Attention: Just like humans, AI works better when it can concentrate on relevant information. Don’t throw all the code at a model, hoping a quality patch is generated. Instead, narrow the scope (i.e., focusing on dependent code with our PDG graph) to help create more effective patches more consistently.
Traditional Tools: Don't throw out traditional security tools. In every case, we rely on static analysis to discover the vulnerability, test the code with a unit test, and automate the process with quality CI/CD pipelines. As we can see, LLMs assist with automating this even further, but they are not replacing everything (...yet).??
Dynamic Prompting: Systems that automatically adjust their approach based on each specific vulnerability perform significantly better. LLMPatch does a great job with this by creating a series of automated steps before the final patch generation prompt, such as extracting relevant code based on the PDG, finding the root cause, comparing root cause examples, and bundling this together for the final prompt. If you check Table 2 on page 11 of their paper, you’ll see just how important each step is for effectiveness.??

Looking ahead, we're witnessing the early stages of self-healing software becoming a reality. Each of these approaches represents another step toward truly resilient systems - and that future might be closer than we think.

要查看或添加评论，请登录

Dylan Davis的更多文章

TLDR: Lessons from 1 year of building with LLMs

2024年7月21日

TLDR: Lessons from 1 year of building with LLMs

This post is best consumed via video. I highly recommend taking the time to watch me chat about each one of these…
Three GPTs Walk into a Bar and Write an Exec Summary

2024年7月4日

Three GPTs Walk into a Bar and Write an Exec Summary

Don’t want to read? Then listen on Spotify or Apple podcast Today I want to share a process around three Custom GPTs…

2 条评论
Life Lessons I Wish I Knew at 17: A Letter to My Niece

2024年1月19日

Life Lessons I Wish I Knew at 17: A Letter to My Niece

Dear Z, You're growing up quickly, and there are many important decisions ahead in the next few years. I realize I…

7 条评论
OpenAI's 2024 Decline: A Value Prediction

2024年1月12日

OpenAI's 2024 Decline: A Value Prediction

This week's post is inspired by a prediction Chamath made in a recent All-in podcast. He predicts a dip in OpenAI's…
How do you keep up with AI?

2024年1月5日

How do you keep up with AI?

People often ask me, "How do you keep up with AI?" In this post, I aim to answer that. Here are the resources I…

1 条评论
Disappearing data moats

2023年12月29日

Disappearing data moats

Are companies overvaluing their data moats in our new world of AI? A deeper exploration into synthetic data suggests…
Securing the New Era of AI-Driven Operating Systems: A Novice's Tale

2023年12月22日

Securing the New Era of AI-Driven Operating Systems: A Novice's Tale

Imagine a world where you’re interacting with a device (computer, phone, etc.) you no longer need to switch between…
GenAI's Shift: From Cyber Villain to LLM Protector

2023年12月15日

GenAI's Shift: From Cyber Villain to LLM Protector

Subscribe, so I can grace your inbox with my presence. ?? Don't want to read? Then listen to this post on Apple podcast…
Cyber Attacks (Session Hijacking)

2020年8月5日

Cyber Attacks (Session Hijacking)

The Attack Yesterday we explored cross-site scripting (XSS) and today this exact attack will lead us to our next…
Cyber Attacks (Injections)

2020年8月4日

Cyber Attacks (Injections)

What is it? Injection attacks are some of the oldest, most dangerous, and well-known attacks in the world of…

See all articles

Self Healing Code

Dylan Davis

Enterprise AI Expert | Making automation surprisingly simple | Sharing no-fluff automation tips that actually work

How Auto-Patching Actually Works

LLMPatch Flow

领英推荐

Key Lessons for Building Auto-Patching Systems

Dylan Davis的更多文章

社区洞察

其他会员也浏览了

How to visualize deep learning models + other resources

Machine Learning Algorithms in Depth - HockeySick #18

The AIFI Newsletter: 8th October 2024

Machines That Learn: Demystifying the Power of Machine Learning

From Documentation to Interaction: Building a More Resilient Web - The DevPortal Newsletter #16

Is YOLOR Better and Faster than YOLOv4?

Beyond Imagination

Benchmark Battles: Comparing Amazon Nova's Performance Against Other Leading Foundation Models

MONTHLY WRAP - DECEMBER 2024

EP 23 - The Deadly Price of Cheap Code: Boeing’s Tragedy and the Looming AI Crisis

How Auto-Patching Actually Works

LLMPatch Flow

领英推荐

Key Lessons for Building Auto-Patching Systems

Dylan Davis的更多文章

TLDR: Lessons from 1 year of building with LLMs

Three GPTs Walk into a Bar and Write an Exec Summary

Life Lessons I Wish I Knew at 17: A Letter to My Niece

OpenAI's 2024 Decline: A Value Prediction

How do you keep up with AI?

Disappearing data moats

Securing the New Era of AI-Driven Operating Systems: A Novice's Tale

GenAI's Shift: From Cyber Villain to LLM Protector

Cyber Attacks (Session Hijacking)

Cyber Attacks (Injections)

社区洞察

其他会员也浏览了

How to visualize deep learning models + other resources

Machine Learning Algorithms in Depth - HockeySick #18

The AIFI Newsletter: 8th October 2024

Machines That Learn: Demystifying the Power of Machine Learning

From Documentation to Interaction: Building a More Resilient Web - The DevPortal Newsletter #16

Is YOLOR Better and Faster than YOLOv4?

Beyond Imagination

Benchmark Battles: Comparing Amazon Nova's Performance Against Other Leading Foundation Models

MONTHLY WRAP - DECEMBER 2024

EP 23 - The Deadly Price of Cheap Code: Boeing’s Tragedy and the Looming AI Crisis