Google AI Making Progress in Federated Learning with Formal Differential Privacy Guarantees

Google AI Making Progress in Federated Learning with Formal Differential Privacy Guarantees

Google AI has been involved in a flurry of research papers that has recently peaked my interest during the last few weeks, namely?this one, this?one?and?this.

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below).

https://aisupremacy.substack.com/subscribe

AiSupremacy is a Newsletter at the intersection of A.I. and breaking news.?You can keep up to date with the articles?here.

Since AiSupremacy is not Synced (a great resource for AI academic research summaries), we aren’t going to go into all of them. However let’s try to unpack this:

Google AI Blog

While my?Datascience Learning Center?usually goes into the technical jargon, we can also easily understand this with a bit of context.

Tracking the evolution of federated learning, differential privacy, DP-FTRL is pretty crucial to making sure A.I. is secure, anonymized and allows machine learning to work in more sensitive data fields.

Fair warning this article is going to be a bit more technical so if the topic doesn’t interest you, just skip it. I personally do my best to?give credit where credit is due?with regards to the work related to A.I. that Google, Microsoft, Facebook do, which is significant for the field as a whole.

What is Federated Learning (FL)?

  • Federated learning is a machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them.
  • In 2017, Google?introduced federated learning?(FL), an approach that enables mobile devices to collaboratively train machine learning (ML) models while keeping the raw training data on each user's device, decoupling the ability to do ML from the need to store the data in the cloud.
  • Since its introduction, Google has continued to?actively engage in FL research?and deployed FL to power many features in?Gboard, including next word prediction, emoji suggestion and out-of-vocabulary word discovery. Federated learning is improving the?“Hey Google”?detection models in Assistant,?suggesting replies?in Google Messages,?predicting text selections, and more.

Differential Privacy (DP)

  • Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset.
  • While FL allows ML without raw data collection,?differential privacy?(DP) provides a quantifiable measure of data anonymization, and when applied to ML can address concerns about models memorizing sensitive user data. This too has been a top research priority, and has yielded one of the first production uses of DP for analytics with?RAPPOR?in 2014,?our open-source DP library,?Pipeline DP, and?TensorFlow Privacy.

What is DP-FTRL

Check it out on GitHub

  • Google AI says through a multi-year, multi-team effort spanning fundamental research and product integration, today we are excited to announce that we have deployed a production ML model using federated learning with a rigorous differential privacy guarantee.
  • For this proof-of-concept deployment, they utilized?the DP-FTRL algorithm?to train a recurrent neural network to power next-word-prediction for Spanish-language Gboard users. To our knowledge, this is the first production neural network trained directly on user data announced with a formal DP guarantee (technically ρ=0.81?zero-Concentrated-Differential-Privacy, zCDP, discussed in detail below). Further, the federated approach offers complimentary data minimization advantages, and the DP guarantee protects all of the data on each device, not just individual training examples.

Read the Paper on DP-FTRL

This research is important especially as Google AI and Google Brain and other teams like DeepMind get into healthcare AI and healthcare data.

No alt text provided for this image


Data Minimization and Anonymization in Federated Learning

Along with fundamentals like transparency and consent, the?privacy principles of data minimization and anonymization?are important in ML applications that involve sensitive data.

Read Federated Learning and Privacy

Federated learning systems structurally incorporate the principle of?data minimization.?FL only transmits minimal updates for a specific model training task (focused collection), limits access to data at all stages, processes individuals’ data as early as possible (early aggregation), and discards both collected and processed data as soon as possible (minimal retention).

Another principle that is important for models trained on user data is?anonymization, meaning that the final model should not?memorize?information unique to a particular individual's data, e.g., phone numbers, addresses, credit card numbers. However, FL on its own does not directly tackle this problem.

In 2022 Apple, Google and Microsoft are renewing their emphasis on privacy as they move into new areas of our private and personal data and expand their Cloud Computing services.

No alt text provided for this image

Jeff Dean is head of AI at Google in some capacity. Especially regarding Research and Health. With Google moving into the healthcare sector more, this kind of research becomes more important.

The Challenging Path to Federated Learning with Differential Privacy

In 2018, Google AI introduced?the DP-FedAvg algorithm, which extended the DP-SGD approach to the federated setting with user-level DP guarantees, and in 2020 Google?deployed this algorithm?to mobile devices for the first time. This approach ensures the training mechanism is not too sensitive to any one user's data, and?empirical privacy auditing techniques?rule out some forms of memorization.

As Google has gotten into hardware and the smartphones and device business in recent years, this technology also becomes more salient to their business model.

However, the amplification-via-samping argument is essential to providing a strong DP guarantee for DP-FedAvg, but in?a real-world cross-device FL system?ensuring devices are subsampled precisely and uniformly at random from a large population would be complex and hard to verify. Their challenge is that devices choose when to connect (or "check in") based on many external factors (e.g., requiring the device is idle, on unmetered WiFi, and charging), and the number of available devices can vary substantially.

Read Google AI Blog

About My Work

Did you know, I also run other related Newsletters? The front page of my Newsletters are small treasure droves of articles now.

I’ve also started a recent Newsletter on A.I. bite-size news articles.

Join A.I. Survey Newsletter

To my knowledge, I’m among?just a handful of indie media startups?developing multiple Newsletters simultaneously.


FL with DP

Achieving a formal privacy guarantee requires a protocol that does?all?of the following:

  • Makes progress on training even as the set of devices available varies significantly with time.
  • Maintains privacy guarantees even in the face of unexpected or arbitrary changes in device availability.
  • For efficiency, allows client devices to locally decide whether they will check in to the server in order to participate in training, independent of other devices.

While Google says it reached the milestone of deploying a production FL model using a mechanism that provides a meaningfully small zCDP, their research journey?continues.

They are still far from being able to say this approach is possible (let alone practical) for most ML models or product applications, and other approaches to private ML exist. They note that they are excited to continue the journey toward maximizing the value that ML can deliver while minimizing potential privacy costs to those who contribute training data.

I was excited to reach in Sync, that a research team from Cornell University and Google Brain introduces FLASH, a model family that achieves quality on par with fully augmented transformers while maintaining linear scalability over the context size on modern accelerators.

Read Synced Blog on FLASH

NOTE FROM THE AUTHOR

If you enjoy articles about A.I. at the intersection of breaking news join AiSupremacy?here. I cannot continue to write without community support. (follow the link below).

https://aisupremacy.substack.com/subscribe

AiSupremacy is a Newsletter at the intersection of A.I. and breaking news.?You can keep up to date with the articles?here.

AiSupremacy is the fastest Substack Newsletter in AI at the intersection of breaking news.?It’s ranked #1 in Machine Learning as of January 22nd, 2022.

Thanks for reading!

Michael Spencer

A.I. Writer, researcher and curator - full-time Newsletter publication manager.

3 年

Google AI, DeepMind and Google Brain continue to really lead the pack in research. How do you beat a trifecta like that?

回复

要查看或添加评论,请登录

Michael Spencer的更多文章

  • Top AI Newsletters of 2025, unstacked

    Top AI Newsletters of 2025, unstacked

    Cover: writer & researcher Grace Shao - Newsletter: https://aiproem.substack.

    5 条评论
  • About that Manus AI Thing

    About that Manus AI Thing

    And welcome to our next article in our series on AGI. To read my full work, for less than $2 week subscribe here.

    2 条评论
  • How far ahead is ChatGPT as a first mover?

    How far ahead is ChatGPT as a first mover?

    This is an incredible deep dive into OpenAI, its history, business model and ChatGPT's huge lead in usage and as an…

    2 条评论
  • Is BYD Disrupting Tesla in 2025?

    Is BYD Disrupting Tesla in 2025?

    When we think of AI Supremacy as in the AI race between China and the U.S.

    24 条评论
  • The Fundamental Lie of OpenAI's Mission

    The Fundamental Lie of OpenAI's Mission

    Welcome Back, Everyone from OpenAI to DeepSeek claims they are an AGI startup, but the way these AI startups are…

    13 条评论
  • Vibe Coding: Revolution or Regression Students and Non-coders?

    Vibe Coding: Revolution or Regression Students and Non-coders?

    Good Morning, As the vibe coding interface takes shape, I’ve been checking out a new startup coming out of stealth this…

    10 条评论
  • The Truth about DeepSeek's Integration in China and WeChat Explained

    The Truth about DeepSeek's Integration in China and WeChat Explained

    DeepSeek's rapid integration in China is a bigger story that is being told. It's not just the China Cloud leaders…

    6 条评论
  • How AI Datacenters Work

    How AI Datacenters Work

    Good Morning, Get the full inside scoop on key AI topics for less than $2 a week with a premium subscription to my…

    5 条评论
  • How Nvidia is down 30% from its Highs

    How Nvidia is down 30% from its Highs

    If like me, you are wondering why Nvidia is down more than 20% this year even when the demand is still raging for AI…

    8 条评论
  • What DeepSeek Means for AI Innovation

    What DeepSeek Means for AI Innovation

    Welcome to another article by Artificial Intelligence Report. LinkedIn has started to "downgrade" my work.

    16 条评论

社区洞察

其他会员也浏览了