Trained ML models leak (properties of) training?data
Debmalya Biswas
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
Generative AI, esp. the training / fine-tuning of Large Language Models (LLMs), has renewed the discussion around the properties of training data inherited by LLMs. This has obvious implications on biased inferences / responses provided by LLMs.
From a privacy perspective, we show that LLMs can leak properties of the underlying training data and caution needs to be exercised while sharing a pre-trained LLM, esp. one trained on enterprise data.
Synthetic data has been purported to be a solution in this context to generate privacy preserving data. This implies synthetic data that is close to (and generated based on) the original training data - in such a way that is compliant with privacy regulations. We show that such claims need to be taken with a ‘grain of salt’ — as there are numerous challenges from a standardization and framework maturity point of view to both making and evaluating such claims.
ML Attack Scenarios
Let us first consider the attack scenarios in a ML context [1, 2] - Fig. 1.
There are mainly two broad categories of inference attacks: membership inference and property inference attacks. A membership inference attack refers to a basic privacy violation, where the attacker’s objective is to determine if a specific user data item was present in the training dataset. In property inference attacks, the attacker’s objective is to reconstruct properties of a participant’s dataset.
When the attacker does not have access to the model training parameters, it is only able to run the models (via an API) to get a prediction/classification. Black box attacks [3] are still possible in this case where the attacker has the ability to invoke/query the model, and observe the relationships between inputs and outputs.
Privacy Risks
It has been shown [4] that
trained models (including Deep Neural Networks) may leak insights related to the underlying training dataset.
This is because (during backpropagation) gradients of a given layer of a neural network are computed using the layer’s feature values and the error from the next layer. For example, in the case of sequential fully connected layers,
the gradient of error E with respect to W_l is defined as:
That is, the gradients of W_l are inner products of the error from the next layer and the features h_l?; and hence the correlation between the gradients and features. This is esp. true if certain weights in the weight matrix are sensitive to specific features or values in the participants’ dataset (for example, specific words in a language prediction model [5]).
Bias and Fairness
[6] defines AI/ML Bias “as a phenomenon that occurs when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning process”.
Bias in AI/ML models is often unintentional, however it has been observed far too frequently in deployed use-cases to be taken lightly. Google Photo labeling pictures of a black Haitian-American programmer as “gorilla”, to the more recent “White Barack Obama” images; are examples of ML models discriminating on gender, age, sexual orientation, etc. The unintentional nature of such biases will not prevent your enterprise from getting fined by regulatory bodies, or facing public backlash on social media — leading to loss of business. Even without the above repercussions, it is just ethical that AI/ML models should behave in all fairness towards everyone, without any bias. However, defining ‘fairness’ is easier said than done. Does fairness mean, e.g., that the same proportion of male and female applicants get high risk assessment scores? Or that the same level of risk result in the same score regardless of gender? It’s impossible to fulfill both definitions at the same time [7].
领英推荐
Bias creeps into AI models, primarily due to the inherent bias already present in the training data. So the ‘data’ part of AI model development is key to addressing bias.
[8] provides a good classification of the different types of ‘bias’ — introduced at different stages of the AI/ML development lifecycle:
Focusing on the ‘training data’ related bias types,
A detailed analysis of the training data is needed to ensure that it is representative and uniformly distributed over the target population, with respect to the selected features.
Synthetic Data
The availability of good quality data (in significant volumes) remains a concern for the success of AI/ML projects. Synthetic data generation aims to provide high quality data that is synthetically generated to closely resemble the original data.
Generative Adversarial Networks (GANs) have proven quite effective for synthetic data generation. Intuitively, a GAN can be considered as a game between two networks: A Generator network and a second Classifier network. A Classifier can, e.g., be a Convolutional Neural Network (CNN) based image classification network; distinguishing samples as either coming from the actual distribution or from the Generator. Every time the Classifier is able to tell a fake image, i.e. it notices a difference between the two distributions; the Generator adjusts its parameters accordingly. At the end (in theory), the Classifier will be unable to distinguish, implying the Generator is then able to reproduce the original data set.
Privacy regulations (e.g. EU GDPR) restrict the Personally Identifiable Information (PII) that can be used for analytics. As such, there has been renewed interest in synthetic data, in its ability generate privacy preserving synthetic data.
This implies synthetic data that is close to (and generated based on) the original training data; in such a way that is compliant with privacy regulations; while still allowing similar insights to be derived as could be derived from the original training data.
The premise is promising, and this has been accompanied by very optimistic messaging from both governmental organizations and commercial entities.
While the promise of privacy preserving synthetic data is valid, the truth is that such claims need to be taken with a ‘grain of salt’ — as there are numerous challenges currently to both making and evaluating such claims.
For example, there is no agreement today (or a standard framework) on even which privacy metric to use to validate such claims.
With current synthetic data generation techniques, the protection level varies by user. It is difficult to predict the features that the model will learn and those that the adversary will attack — due to randomness in the generation algorithms (e.g., GANs, GPTs) — implying that we cannot guarantee privacy protection for all users. [9] shows that synthetic data generated by a number of generative models actually leak more information, i.e. they perform worse than the original (training) dataset with respect to privacy metrics, e.g., Linkability and Attribute Inference.
References
Technology Strategist for Education&Research | Cloud, AI, Developer Community @ Microsoft
1 年Very interesting Debmalya. Do you think this is different/more severe compared to existing types of ML model attacks, like described here https://arxiv.org/abs/1806.01246 ? If attackers gain access to trained models/endpoints, we may have multiple attack vectors (shadow models, posterior value distribution etc), is there a specific angle for LLMs ?