The ultimate guide on prompt injection
Remember little Bobby Tables?
He’s all grown up now.
What is prompt injection?
Prompt injection is a general term for a category of techniques designed to cause an LLM (Large Language Model) to produce harmful output. When applications use LLM technology to somehow respond to user input, users can give arbitrary instructions to the LLM, potentially bypassing censorship and revealing sensitive information.
This is roughly analogous to SQL injection, since it works on the same general principle of escaping the limited context of a string literal that user input is supposed to reside in, which gives the user the power to actually execute instructions. SQL injection is for the most part a solved problem at this point though, because we’ve learned to sanitize user inputs and separate them from code instructions.
Prompt injection, however, is a brand new beast. It started becoming well-known only in 2021 and 2022, and only recently with the explosion of AI-driven SaaS tools has it become a serious security concern. Perhaps you’ve heard of the story from 2023 where Chevrolet decided to put chatbots on their dealership websites, and people quickly got it to offer them cars for just $1 by just asking it to . The general public was so unaware of this attack vector that news outlets called the person who originally posted about it a “hacker” . As you’d expect, Chevy isn’t too keen to keep their chatbot’s word… but if they were in Canada they might be forced to. A court in British Columbia recently set a precedent that companies are responsible for the output of AI agents on their website since its appearance implies that the company endorses what the LLM is saying. This was decided in a case where a chatbot on Air Canada’s site misled a customer about the process for getting a flight to a funeral refunded — the customer sued for the price of the fare plus legal fees and won.
How are we to deal with this mind-blowing, potentially legal-action-inducing vulnerability? That’s an excellent question. Given that Algolia’s engineers and our friends across the industry are some of the world’s leading experts on generative AI, we’ve set out to compile the ultimate guide on mitigating the risks associated with prompt injection. Unless otherwise indicated, the information to follow comes from our in-house AI experts, but you’ll see the sources cited when it comes from extensive external research and interviews conducted by an experienced developer and technical author on our blog team.
Do you even need to use an LLM?
Before we get started on solutions, let’s do a little risk analysis. One volunteer organization involved in construction work notes in their internal guidelines that eliminating risks is the first step to safe output. Swapping dangerous ideas for less risky ones comes next, and only then do we get to solutions that involve engineering. Surely you’d agree that removing risks altogether is better than trying to mitigate or lessen them?
With that in mind, be honest about your use case: if it wasn’t the trendy thing to do, would you even be using an LLM? Is it the right tool for the job? Before we get to engineering solutions, examine whether you can remove the risky LLM tech altogether or replace it with a narrower, safer, solution. Consider these questions:
Biased plug: This vector search algorithm is actually the idea behind Algolia’s main product, NeuralSearch . This article is meant to be educational and not marketing, so instead of extolling the virtues of NeuralSearch here, feel free to read further about it with this blog post and come to your own conclusions. Because we have experience in this though, we’re going to explore these vector-based ideas more in future articles.
领英推荐
It’s not as scary as it looks, though. Those graphs of nodes actually condense into some fairly straightforward equations if you build it up from first principles. That’s the premise of a very in-depth DIY series from sentdex on YouTube called Neural Networks from Scratch , which was also worked into an interactive book of the same name . The goal was just to understand the root principles of these kinds of networks, since they produce seemingly-complex results from rather simple instructions. In a real application, you’d likely use a framework that handles most of this complex math for you like Tensorflow and Keras or PyTorch. We’ve even built one or two for this blog to use in tandem with legit LLMs. In this use case, the output of these models need only be a few nodes. If the network is trained to make a certain limited-choice decision, the combination of which nodes are on2 can determine which choice to pick.
Despite the cautious tone of the previous section, here at Algolia we’re incredibly optimistic about generative AI (genAI) when it’s used in the right context — LLMs were even used sparingly in the creation of this article. But to use them responsibly, we must understand the risks and plan accordingly. If we don’t need to expose ourselves to the vulnerabilities and costs that come along with LLMs, why should we?
Identifying and lessening risks associated with prompt injection
Say that your use case does require that you use an LLM — what then?
Well, our friends over at Prism Eval mentioned in an interview that while the ideal solution would be that the LLM is trained to not know harmful content in the first place, this is an unreasonable approach. Why? Remember that what counts as harmful can change based on the application. Talking about $1 cars is harmful content for Chevrolet, but we could easily construct a scenario where, say, a student solving a homework problem might talk about $1 cars. There, that conversation would be helpful to the student, not harmful. So if that approach isn’t going to work, what other steps can we take?
Remember how during the COVID-19 pandemic, we were advised of many different precautions we could take to slow the spread of the virus and protect ourselves from infection? None of the individual methods were 100% effective, but they were very effective as a group. Each precaution caught much of the potential for infection that the previous precaution missed, and the next precaution caught even more. This is known as the “Swiss cheese” model for risk prevention.
Let’s apply that model to the risks associated with LLMs: if we can identify specific attack vectors and develop strategies to counteract them, we should be able to stack those strategies up next to each other and drastically increase our coverage.
This is by no means an exhaustive list, and it should be clear that this is still an area of active research, but we’ll focus in on these five categories of solutions.
Continue reading The Ultimate Guide on Prompt Injection
Super cool article! If you are in this situation, you need to check out our open-source products. We have multiple solutions for this problem. https://github.com/openshieldai/openshield
Founder & CEO, Group 8 Security Solutions Inc. DBA Machine Learning Intelligence
4 个月Great insights.