Trained ML models leak (properties of) training?data
Fig 1: Machine Learning (ML) attack scenarios

Trained ML models leak (properties of) training?data

Generative AI, esp. the training / fine-tuning of Large Language Models (LLMs), has renewed the discussion around the properties of training data inherited by LLMs. This has obvious implications on biased inferences / responses provided by LLMs.

From a privacy perspective, we show that LLMs can leak properties of the underlying training data and caution needs to be exercised while sharing a pre-trained LLM, esp. one trained on enterprise data.

Synthetic data has been purported to be a solution in this context to generate privacy preserving data. This implies synthetic data that is close to (and generated based on) the original training data - in such a way that is compliant with privacy regulations. We show that such claims need to be taken with a ‘grain of salt’ — as there are numerous challenges from a standardization and framework maturity point of view to both making and evaluating such claims.

ML Attack Scenarios

Let us first consider the attack scenarios in a ML context [1, 2] - Fig. 1.

There are mainly two broad categories of inference attacks: membership inference and property inference attacks. A membership inference attack refers to a basic privacy violation, where the attacker’s objective is to determine if a specific user data item was present in the training dataset. In property inference attacks, the attacker’s objective is to reconstruct properties of a participant’s dataset.

When the attacker does not have access to the model training parameters, it is only able to run the models (via an API) to get a prediction/classification. Black box attacks [3] are still possible in this case where the attacker has the ability to invoke/query the model, and observe the relationships between inputs and outputs.

Privacy Risks

It has been shown [4] that

trained models (including Deep Neural Networks) may leak insights related to the underlying training dataset.

This is because (during backpropagation) gradients of a given layer of a neural network are computed using the layer’s feature values and the error from the next layer. For example, in the case of sequential fully connected layers,

No alt text provided for this image

the gradient of error E with respect to W_l is defined as:

No alt text provided for this image

That is, the gradients of W_l are inner products of the error from the next layer and the features h_l?; and hence the correlation between the gradients and features. This is esp. true if certain weights in the weight matrix are sensitive to specific features or values in the participants’ dataset (for example, specific words in a language prediction model [5]).

Bias and Fairness

[6] defines AI/ML Bias “as a phenomenon that occurs when an algorithm produces results that are systemically prejudiced due to erroneous assumptions in the machine learning process”.

Bias in AI/ML models is often unintentional, however it has been observed far too frequently in deployed use-cases to be taken lightly. Google Photo labeling pictures of a black Haitian-American programmer as “gorilla”, to the more recent “White Barack Obama” images; are examples of ML models discriminating on gender, age, sexual orientation, etc. The unintentional nature of such biases will not prevent your enterprise from getting fined by regulatory bodies, or facing public backlash on social media — leading to loss of business. Even without the above repercussions, it is just ethical that AI/ML models should behave in all fairness towards everyone, without any bias. However, defining ‘fairness’ is easier said than done. Does fairness mean, e.g., that the same proportion of male and female applicants get high risk assessment scores? Or that the same level of risk result in the same score regardless of gender? It’s impossible to fulfill both definitions at the same time [7].

Bias creeps into AI models, primarily due to the inherent bias already present in the training data. So the ‘data’ part of AI model development is key to addressing bias.

[8] provides a good classification of the different types of ‘bias’ — introduced at different stages of the AI/ML development lifecycle:

No alt text provided for this image
Fig. 2: AI/ML Bias types (Source: [8])

Focusing on the ‘training data’ related bias types,

  • Historical Bias: arises due to historical inequality of human decisions captured in the training data
  • Representation Bias: arises due to training data that is not representative of the actual population
  • Measurement & Aggregation Bias: arises due to improper selection and combination of features.

A detailed analysis of the training data is needed to ensure that it is representative and uniformly distributed over the target population, with respect to the selected features.

Synthetic Data

The availability of good quality data (in significant volumes) remains a concern for the success of AI/ML projects. Synthetic data generation aims to provide high quality data that is synthetically generated to closely resemble the original data.

Generative Adversarial Networks (GANs) have proven quite effective for synthetic data generation. Intuitively, a GAN can be considered as a game between two networks: A Generator network and a second Classifier network. A Classifier can, e.g., be a Convolutional Neural Network (CNN) based image classification network; distinguishing samples as either coming from the actual distribution or from the Generator. Every time the Classifier is able to tell a fake image, i.e. it notices a difference between the two distributions; the Generator adjusts its parameters accordingly. At the end (in theory), the Classifier will be unable to distinguish, implying the Generator is then able to reproduce the original data set.

No alt text provided for this image
Fig 3: Generative Adversarial Network (GAN) architecture

Privacy regulations (e.g. EU GDPR) restrict the Personally Identifiable Information (PII) that can be used for analytics. As such, there has been renewed interest in synthetic data, in its ability generate privacy preserving synthetic data.

This implies synthetic data that is close to (and generated based on) the original training data; in such a way that is compliant with privacy regulations; while still allowing similar insights to be derived as could be derived from the original training data.

The premise is promising, and this has been accompanied by very optimistic messaging from both governmental organizations and commercial entities.

  • NIST Differential Privacy Synthetic Data Challenge (link): “Propose an algorithm to develop differentially private synthetic datasets to enable the protection of personally identifiable information (PII) while maintaining a dataset’s utility for analysis.”
  • Diagnosing the NHS — Syn? (link): “ODI Leeds and NHS England will be working together to explore the potential of ‘synthetic data.’ This is data that has been created following the patterns identified in a real dataset but it contains no personal data, making it suitable to release as open data.”
  • Statice (link): “Statice generates synthetic data — just like real data, but privacy-compliant”
  • Hazy (link): “Hazy’s synthetic data generation lets you create business insight across company, legal and compliance boundaries — without moving or exposing your data.”

While the promise of privacy preserving synthetic data is valid, the truth is that such claims need to be taken with a ‘grain of salt’ — as there are numerous challenges currently to both making and evaluating such claims.

For example, there is no agreement today (or a standard framework) on even which privacy metric to use to validate such claims.

With current synthetic data generation techniques, the protection level varies by user. It is difficult to predict the features that the model will learn and those that the adversary will attack — due to randomness in the generation algorithms (e.g., GANs, GPTs) — implying that we cannot guarantee privacy protection for all users. [9] shows that synthetic data generated by a number of generative models actually leak more information, i.e. they perform worse than the original (training) dataset with respect to privacy metrics, e.g., Linkability and Attribute Inference.

References

  1. M. Rigaki and S. Garcia. A Survey of Privacy Attacks in Machine Learning. 2020, https://arxiv.org/abs/2007.07646
  2. C. Briggs, Z. Fan, and P. Andras. A Review of Privacy-preserving Federated Learning for the Internet-of-Things, 2020, https://arxiv.org/abs/2004.11794
  3. A. Ilyas, L. Engstrom, A. Athalye, and J. Lin. Black-box Adversarial Attacks with Limited Queries and Information. In Proceedings of the 35th International Conference on Machine Learning, pages 2137–2146. PMLR, 2018, https://proceedings.mlr.press/v80/ilyas18a.html.
  4. Nasr, M., Shokri, R., & Houmansadr, A. (2019). Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. 2019 IEEE Symposium on Security and Privacy (SP), 739–753.
  5. H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. Arcas. Communication-Efficient Learning of Deep Networks from Decentralized Data, 2017, https://arxiv.org/abs/1602.05629.
  6. SearchEnterprise AI.?Machine Learning bias (AI bias)?(link)
  7. K. Hao.?This is how AI Bias really happens — and why it’s so Hard to Fix?(link)
  8. H. Suresh, J. V. Guttag.?A Framework for Understanding Unintended Consequences of Machine Learning?(link)
  9. Stadler, T., Oprisanu, B., and Troncoso, C. Synthetic Data — A Privacy Mirage, 2020, https://arxiv.org/abs/2011.07018.
  10. D. Biswas, K. Vidyasankar. A Privacy Framework for Hierarchical Federated Learning. In proc. of the 3rd CIKM Workshop on Privacy, Security, and Trust in Computational Intelligence (PSTCI), 2021, https://ceur-ws.org/Vol-3052/paper17.pdf

Florian Georg

Technology Strategist for Education&Research | Cloud, AI, Developer Community @ Microsoft

1 年

Very interesting Debmalya. Do you think this is different/more severe compared to existing types of ML model attacks, like described here https://arxiv.org/abs/1806.01246 ? If attackers gain access to trained models/endpoints, we may have multiple attack vectors (shadow models, posterior value distribution etc), is there a specific angle for LLMs ?

要查看或添加评论,请登录

Debmalya Biswas的更多文章

社区洞察

其他会员也浏览了