How to defend against Prompt Injection Attacks in AI-based Applications?
If you're an application developer or product manager who has integrated ChatGPT into your services, this article is tailored for you. It addresses the issue of prompt injection attacks in AI chatbots - a situation where AI gives unusual or inappropriate responses.
This problem often arises in applications primarily built on direct API calls to OpenAI, lacking in-depth backend processing. This makes them susceptible to prompt attacks and reliant on users providing accurate and intended prompts.
The article aims to provide insight into this challenge, using examples from simple applications affected by such attacks.
Case 1: ChatPDF?
ChatPDF is an AI-powered tool that allows users to interact with PDFs to extract information, pose questions, and obtain summaries. Users can upload PDFs and inquire about the content within. It is powered by OpenAI’s APIs.
In the intended use case, for example, a user has a 100-page PDF, such as a legal contract, and they can use ChatPDF to answer any question about the document. Additionally, it's designed to provide information on details not in the PDF but related to it, made possible through its connection to OpenAI.
However, the application acts more like a general-purpose question-answer tool like ChatGPT rather than sticking to its specific use case. For instance, I uploaded the Terms and Conditions document of Facebook into ChatPDF and asked an irrelevant question, “Which came first, the chicken or the egg?”. As the Product Manager of the application, I would expect a generic response like, “The uploaded PDF has no information about your question.” Instead, ChatPDF provided a lengthy response, as shown in the screenshot below. The response is correct, but completely unrelated to the purpose of the application.
This instance is a classic example of a prompt attack, where the user's input is deliberately crafted to sidestep the intended purpose of the application.
Case 2: MedicalPDF?
Another instance involves an application named MedicalGPT, which is designed to address medical-related queries. However, it often functions like ChatGPT, answering a broad range of questions. For instance, I asked, “Give me one idea to become rich,” expecting a standard reply like, “This service is intended for Medical related questions only.” Surprisingly, it advised investing in the stock market, deviating from its medical focus. The response is displayed below.
领英推荐
This issue arises because these applications have a singular layer of integration with OpenAI. To explain this, let's consider a hypothetical case of a Mental Health application powered by ChatGPT. Refer to the accompanying diagram for a visual representation of this concept.
Such single layered integrations can have big effects to revenue generating businesses for eg, Chatbots in e-commerce applications. In the case of applications like Instacart using ChatGPT, prompt attacks can indeed disrupt user experience. A misaligned query could yield irrelevant responses, wasting resources and potentially decreasing conversion rates.
To mitigate this, implementing a layered response system, similar to Google’s Bard or OpenAI’s ChatGPT, is beneficial. This system involves a secondary verification process where the AI double-checks its responses against the application's intended purpose. If a response is off-target, the system prompts the user to refine their query, thereby enhancing both relevance and safety in user interactions. Refer to the accompanying diagram for a visual representation of this concept.
Adding a layer of verification to AI systems might make things a bit more expensive, but it's really important for keeping them safe from bad attacks and making sure they work better.
As technology keeps growing quickly, it's super important for people who make apps and businesses using AI to keep up with these changes. This way, they can make sure their AI tools are safe, work well, and help them connect better with their customers and run their operations smoothly.