Agent Exploit
Brad Edwards
SecOps + Platform Manager @ BCLC | EX-RCMP Fraud and Digital Forensics | MSc CS with AI Student | Full-Stack Dev | CSPO
In LLM Agents can Autonomously Exploit One-day Vulnerabilities Richard Fang, Rohan Bindu , Akul Gupta , and Daniel Kang show how an LLM agent built with GPT-4 and the ReAct agent framework can exploit one-day vulnerabilities using descriptions from Common Vulnerabilities and Exposures (CVE) notices.
CVEs are notices to the tech industry of vulnerabilities in software. They are widely followed by cybersecurity defenders to identify risks. They are also used by attackers to learn about ways to exploit systems. For clarity, ReAct is not the front-end software library, it is a framework for creating agentic systems using the reasoning and natural language abilities of LLMs. One-day vulnerabilities are disclosed but do not yet have a patch to fix them.
Auto Exploitation
The research demonstrates that GPT-4 can autonomously exploit 87% of one-day vulnerabilities from a dataset of 15 real-world examples. Other models (GPT-3.5, open-source LLMs) and vulnerability scanners (ZAP, Metasploit) fail to achieve any successful exploits. The agent's exploit performance drops to 7% without CVE descriptions, which shows the dependency on detailed explanations.
The study uses the ReAct agent framework in LangChain, providing the agent with tools such as web browsing, terminal access, and code interpretation. GPT-4's success is attributed to its ability to utilize these tools effectively. The authors' cost analysis suggests that using GPT-4 for exploitation is 2.8 times cheaper than human labour, highlighting its potential for scalability and cost-effectiveness in cyber operations.
Implications for Cyber Defense:
1. Increased Threat Landscape: The ability of relatively simple LLM-based agents to autonomously exploit vulnerabilities puts more attempts at more vulnerabilities in reach of more threat actors. Tooling like this will raise the effectiveness of the proverbial script kiddies (hackers who can use existing tools but do not have the skills to code on their own).
领英推荐
2. Defense Velocity: This is a now-term capability attackers can use that is in the same vein as software automation or developer augmentation more generally. The automation benefit here will manifest in three ways, all requiring much faster time patch or mitigation. Assuming, conservatively, a speed-up of at least 2x, attackers will:
3. Integration of LLMs in Defense: It is important to note that the enabling capability of GPT-4 is not a cybersecurity ability. It is a task and software automation capability. The same opportunity is available to blue teams. However, it may require teams to have software development skills (AI Engineering being a subset of software engineering) and experimentation time they do not always allocate. That said, given the general ability of GPT-4 level LLMs (undergraduate degree+) every team should consider using advanced LLMs for individual augmentation.
Read the Paper
This is a thought-provoking paper that highlights the changes that have already arrived in cybersecurity and software more generally. But as always, don't take my advice; go read the paper!
A huge thank you to the authors for their work.
#artificialintelligence #llm #cybersecurity