Data Protection in Machine Learning
Jules Polonetsky
CEO @ Future of Privacy Forum | Advancing Responsible Data Practices
Our team at The Future of Privacy Forum has released a white paper, WARNING SIGNS: The Future of Privacy and Security in an Age of Machine Learning, exploring how machine learning systems can be exposed to new privacy and security risks, and explaining approaches to data protection. As Brenda Leong, FPF Senior Counsel & Director, Artificial Intelligence and Ethics says, " Machine learning is a powerful tool with many benefits for society and its use will continue to grow, so it is important to explain the steps creators can take to limit the risk that data could be compromised or a system manipulated.”
The white paper presents a layered approach to data protection in machine learning, including recommending techniques such as noise injection, inserting intermediaries between training data and the model, making machine learning mechanisms transparent, access controls, monitoring, documentation, testing, and debugging.
“Privacy or security harms in machine learning do not necessarily require direct access to underlying data or source code,” said Andrew Burt, Immuta Chief Privacy Officer and Legal Engineer. “We explore how creators of any machine learning system can limit the risk of unintended leakage of data or unauthorized manipulation.”
The data involved in ML exposed to risks in ways that are frequently misunderstood. While traditional software systems already have standard best practices—such as the Fair Information Practice Principles (FIPPs) to guide privacy efforts, or the Confidentiality, Integrity and Availability triad to guide security activities— there exists no widely accepted best practices for the data involved in ML. Adapting existing standards or creating new ones is critical to the successful, widespread adoption of ML. Without such standards, neither privacy professionals, security practitioners, nor data scientists will be able to deploy ML with confidence that the data they steward is adequately protected. And without such protections, ML will face significant barriers to adoption. This short whitepaper aims to create the beginnings of a framework for such standards by focusing on specific privacy and security vulnerabilities within ML systems. At present, we view these vulnerabilities as warning signs— either of a future in which the benefits of ML are not fully embraced, or a future in which ML’s liabilities are insufficiently protected. Our ultimate goal is to raise awareness of new privacy and security issues confronting ML based systems—for everyone from the most technically proficient data scientists to the most legally knowledgeable privacy personnel, along with the many in between. Ultimately, we aim to suggest practical methods to mitigate these potential harms, thereby contributing to the privacy-protective and secure use of ML.
Co-authors of the paper are Leong, Burt, Sophie Stalla-Bourdillon, Immuta Senior Privacy Counsel and Legal Engineer, and Patrick Hall, H2O.ai Senior Director for Data Science Products.
The white paper released today builds on the analysis in Beyond Explainability: A Practical Guide to Managing Risk in Machine Learning Models, released by FPF and Immuta in June 2018.