Google AI Making Progress in Federated Learning with Formal Differential Privacy Guarantees
Michael Spencer
A.I. Writer, researcher and curator - full-time Newsletter publication manager.
Google AI has been involved in a flurry of research papers that has recently peaked my interest during the last few weeks, namely?this one, this?one?and?this.
If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below).
https://aisupremacy.substack.com/subscribe
AiSupremacy is a Newsletter at the intersection of A.I. and breaking news.?You can keep up to date with the articles?here.
Since AiSupremacy is not Synced (a great resource for AI academic research summaries), we aren’t going to go into all of them. However let’s try to unpack this:
While my?Datascience Learning Center?usually goes into the technical jargon, we can also easily understand this with a bit of context.
Tracking the evolution of federated learning, differential privacy, DP-FTRL is pretty crucial to making sure A.I. is secure, anonymized and allows machine learning to work in more sensitive data fields.
Fair warning this article is going to be a bit more technical so if the topic doesn’t interest you, just skip it. I personally do my best to?give credit where credit is due?with regards to the work related to A.I. that Google, Microsoft, Facebook do, which is significant for the field as a whole.
What is Federated Learning (FL)?
Differential Privacy (DP)
What is DP-FTRL
This research is important especially as Google AI and Google Brain and other teams like DeepMind get into healthcare AI and healthcare data.
Data Minimization and Anonymization in Federated Learning
Along with fundamentals like transparency and consent, the?privacy principles of data minimization and anonymization?are important in ML applications that involve sensitive data.
Federated learning systems structurally incorporate the principle of?data minimization.?FL only transmits minimal updates for a specific model training task (focused collection), limits access to data at all stages, processes individuals’ data as early as possible (early aggregation), and discards both collected and processed data as soon as possible (minimal retention).
Another principle that is important for models trained on user data is?anonymization, meaning that the final model should not?memorize?information unique to a particular individual's data, e.g., phone numbers, addresses, credit card numbers. However, FL on its own does not directly tackle this problem.
In 2022 Apple, Google and Microsoft are renewing their emphasis on privacy as they move into new areas of our private and personal data and expand their Cloud Computing services.
领英推荐
Jeff Dean is head of AI at Google in some capacity. Especially regarding Research and Health. With Google moving into the healthcare sector more, this kind of research becomes more important.
The Challenging Path to Federated Learning with Differential Privacy
In 2018, Google AI introduced?the DP-FedAvg algorithm, which extended the DP-SGD approach to the federated setting with user-level DP guarantees, and in 2020 Google?deployed this algorithm?to mobile devices for the first time. This approach ensures the training mechanism is not too sensitive to any one user's data, and?empirical privacy auditing techniques?rule out some forms of memorization.
As Google has gotten into hardware and the smartphones and device business in recent years, this technology also becomes more salient to their business model.
However, the amplification-via-samping argument is essential to providing a strong DP guarantee for DP-FedAvg, but in?a real-world cross-device FL system?ensuring devices are subsampled precisely and uniformly at random from a large population would be complex and hard to verify. Their challenge is that devices choose when to connect (or "check in") based on many external factors (e.g., requiring the device is idle, on unmetered WiFi, and charging), and the number of available devices can vary substantially.
About My Work
Did you know, I also run other related Newsletters? The front page of my Newsletters are small treasure droves of articles now.
I’ve also started a recent Newsletter on A.I. bite-size news articles.
To my knowledge, I’m among?just a handful of indie media startups?developing multiple Newsletters simultaneously.
FL with DP
Achieving a formal privacy guarantee requires a protocol that does?all?of the following:
While Google says it reached the milestone of deploying a production FL model using a mechanism that provides a meaningfully small zCDP, their research journey?continues.
They are still far from being able to say this approach is possible (let alone practical) for most ML models or product applications, and other approaches to private ML exist. They note that they are excited to continue the journey toward maximizing the value that ML can deliver while minimizing potential privacy costs to those who contribute training data.
I was excited to reach in Sync, that a research team from Cornell University and Google Brain introduces FLASH, a model family that achieves quality on par with fully augmented transformers while maintaining linear scalability over the context size on modern accelerators.
NOTE FROM THE AUTHOR
If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below).
https://aisupremacy.substack.com/subscribe
AiSupremacy is a Newsletter at the intersection of A.I. and breaking news.?You can keep up to date with the articles?here.
AiSupremacy is the fastest Substack Newsletter in AI at the intersection of breaking news.?It’s ranked #1 in Machine Learning as of January 22nd, 2022.
Thanks for reading!
A.I. Writer, researcher and curator - full-time Newsletter publication manager.
3 年Google AI, DeepMind and Google Brain continue to really lead the pack in research. How do you beat a trifecta like that?