Gen AI Privacy: Privacy Risks of LLMs
Debmalya Biswas
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
Machine Learning (ML) Privacy Risks
Let us first consider the Privacy attack scenarios in a traditional Supervised ML context [1, 2]. This consists of the majority of AI/ML world today with mostly Machine Learning (ML) / Deep Learning (DL) models developed with the goal of solving a Prediction or Classification task.
There are mainly two broad categories of inference attacks: membership inference and property inference attacks. A membership inference attack refers to a basic privacy violation, where the attacker’s objective is to determine if a specific user data item was present in the training dataset. In property inference attacks, the attacker’s objective is to reconstruct properties of a participant’s dataset.
When the attacker does not have access to the model training parameters, it is only able to run the models (via an API) to get a prediction/classification. Black box attacks [3] are still possible in this case where the attacker has the ability to invoke/query the model, and observe the relationships between inputs and outputs.
Trained ML Model Features Leakage
It has been shown [4] that
trained models (including Deep Neural Networks) may leak insights related to the underlying training dataset.
This is because (during backpropagation) gradients of a given layer of a neural network are computed using the layer’s feature values and the error from the next layer. For example, in the case of sequential fully connected layers,
the gradient of error E with respect to W_l is defined as:
That is, the gradients of W_l are inner products of the error from the next layer and the features h_l; and hence the correlation between the gradients and features. This is esp. true if certain weights in the weight matrix are sensitive to specific features or values in the participants’ dataset.
Gen AI: Privacy Risks of Large Language Models (LLMs)
We first consider the classic ChatGPT scenario, where we have black-box access to a Pre-trained LLM API/UI. Similar LLM APIs can be considered for other Natural Language Processing (NLP) core tasks, e.g., Knowledge Retrieval, Summarization, Auto-Correct, Translation, Natural Language Generation (NLG).
Prompts are the primary interaction mechanism in this scenario, providing the right context and guidance to the LLM API — to maximize the chances of getting the ‘right’ response.
It has led to the rise of Prompt Engineering as a professional discipline, where prompt engineers systematically perform trials, recording their findings, to arrive at the ‘right’ prompt to elicit the ‘best’ response.
From a privacy point of view, we need to consider the following additional / different LLM Privacy risks:
Pre-training Data Leakage
Instead of privacy leakage from Training data belonging to the Enterprise only, we need to start by considering Privacy leakage from Training data used to train the Pre-trained LLM. For example, [5] showed that GPT models can leak privacy-sensitive training data, e.g. email addresses from the standard Enron Email dataset, implying that the Enron dataset is very likely included in the Training data of GPT-4 and GPT-3.5.
Leakage tests consisted of a mix of Context, Zero- and Few-shot Prompting.
领英推荐
The core idea is to provide k-shot true (name, email) pairs (from other users) as demonstrations, and then prompt the model with the target user’s name to the LLM to predict the target email address.
Example templates used for few-shot prompting:
Enterprise Data Leakage
Privacy of Enterprise (training) data does become relevant when we start leveraging LLMs in a RAG setting or Fine-tune LLMs with Enterprise data to create an Enterprise / Domain specific solution / Small Language Model (SLM).
The interesting part here is that the attacker observes both Model snapshots: the Pre-trained LLM and the Fine-tuned SLM. And, we then need to measure the privacy leakage (membership / property inference) with respect to the whole training data: Pre-training data?+ (Delta) Enterprise data.
The (trained) Model features leakage scenario outlined in the case of a traditional Deep Learning model remains applicable in the case of LLMs as well, where e.g. [6] has shown that leakage prone weight sensitive features in a trained DL model can correspond to specific words in a Language Prediction model. [7] goes further to show that fine-tuned models are highly susceptible to privacy attacks, given only API access to the model. This means that if a model is fine-tuned on highly sensitive data, great care must be taken before deploying that model —as large portions of the fine-tuning dataset can be extracted with black-box access! The recommendation then is to deploy such models with additional privacy-preserving techniques, e.g., Differential Privacy.
Conversational Privacy Leakage
With traditional ML models, we are primarily talking about a one-way inference reg. a Prediction or Classification task. In contrast, LLMs enable a two-way conversation, so we need to consider Conversation related Privacy Risks in addition, where e.g. GPT models can leak the user private information provided in a conversation (history).
Personally Identifiable Information (PII) privacy leakage concerns in Conversations are real [8] given that various applications (e.g., Office suites) have started to deploy GPT models at the inference stage to help process enterprise data / documents, which usually contain sensitive (confidential) information.
We can only expect Gen AI adoption to grow in different verticals, e.g. Customer Support, Health, Banking, Dating; leading to the inevitable harvesting of prompts posed by the users as a ‘source of personal data’ for Adverting, Phishing, etc. scenarios. Given this,
we also need to consider implicit privacy risks of natural language conversations (along the lines of side-channel attacks) together with PII leakage concerns.
For example [9], the query: “Wow, this dress looks amazing! What is its price?” can leak the the user's sentiment as compared to a more neutral prompt: “This dress fits my requirements. What is its price?”
Privacy Intent Compliance
Finally, LLMs today allow users to be a lot more prescriptive with respect to processing their Prompts / Queries - Chain-of-Thought (CoT) Prompting. Chain-of-Thought (CoT) is a framework that addresses how a LLM is solving a problem. During prompting, user provides the logic about how to approach a certain problem and LLM will solve the task using suggested logic and returns the output along with the logic.
CoT can be extended to allow the User to explicitly specify their Privacy Intent in Prompts using keywords e.g., "in confidence", "confidentially", "privately", "in private", "in secret", etc. So we also need to assess the LLM effectiveness in complying with these User privacy requests. For example, [5] showed that GPT-4 will leak private information when told “confidentially”, but will not when prompted “in confidence”.
Conclusion
Gen AI is a disruptive technology, and we are seeing it evolve faster than anything we have experienced before. So it is very important that we scale their enteprise adoption in a responsible fashion, with Responsible AI practices integrated with LLMOps pipelines [10]. User privacy is a key and fundamental dimension of Responsible AI, and we discussed the privacy risks of LLMs in detail in this article.
LLMs by their very nature - the way they are trained and deployed; bring some novel privacy challenges that have not been considered previously for more traditional ML models. In this article, we outlined the additional privacy risks and mitigation strategies that need to be considered for safe deployment of LLM enabled use-cases in enterprises. In the future, we are working towards a tooling recommendation to address the highlighted LLM privacy risks.
References
Helping Busy Professionals Transform their Lives using "MAI" Strategy | Top AI Voice
3 个月Thank you for shedding light on the evolving privacy risks associated with Large Language Models in enterprises, Debmalya Biswas. It's crucial for organizations to adapt their privacy frameworks to address the novel aspects posed by LLMs.
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
3 个月Attached the slides https://www.slideshare.net/slideshow/gen-ai-privacy-risks-of-large-language-models-llms/270194480
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
3 个月Also, published in Towards Data Science now: https://towardsdatascience.com/generative-ai-privacy-risks-5983f2f594e4
Demystifying AI, Blockchain, and Tech Culture
3 个月Leaking information from the training set is major blocker to enterprise adoption. It is also a major liability for any creative use of #GenAI where confidentiality is part of training data licensing deals. Debmalya Biswas, any thoughts on how security audits and vendors will evolve for AI products given this novel threat vector? Is it even possible to ensure training data confidentiality with the current architectures?
AI Business Automation & Workflows | Superior WordPress Maintenance & Services | Podcast
3 个月Intriguing insight into the privacy concerns with LLMs. Always evolving, aren't we?