Indirect Prompt Injection to LLMs
Fluid Attacks
We hack your software. Comprehensive Continuous Hacking: Develop secure software from the start.
Attackers can indirectly instruct AI for malicious aims
Large language models (LLMs), widely used today in generative artificial intelligence, can be subject to attacks and function as attack vectors. This can lead to the theft of sensitive information, fraud, spreading of malware, intrusion, and alteration of AI system availability, among other incidents. While such attacks can take place directly, they can also occur indirectly. It is the latter form of attack —specifically indirect prompt injection— that we intend to discuss in this post, providing a quick and digestible account of a recent research paper by Greshake et al. in this regard.
LLMs are machine learning models of the artificial neural network type that use deep learning techniques and enormous amounts of data to process, predict, summarize and generate content, usually in the form of text. These models' functionalities are modulated by natural language prompts or instructions. LLMs are increasingly being integrated into other applications to offer users, for example, interactive chats, summaries of web searches and calls to different APIs. In other words, they are no longer stand-alone units with controlled input channels but units that receive arbitrarily retrieved inputs from various external sources.
Here is where indirect prompt injection comes in. Usually, exploitation to bypass content restrictions and gain access to the model's original instructions was confined to direct intervention (e.g., individuals directly attacking their own LLMs or public models). However, Greshake et al. have revealed that adversaries can now remotely control the model and compromise the applications' data and services and the associated users. Attackers can strategically inject malicious prompts into external data sets likely to be retrieved by the LLM for processing and output generation to achieve desired adverse effects.
Read more about this here: https://fluidattacks.com/blog/indirect-prompt-injection-llms/