登录查看更多内容

Secure aggregation on top of Federated Learning: Same faces, blurred masks

Imen Ayari

Capabilities builder, catalyst of value creation; AI/Gen AI, Spatial Computing, Blockchain, Quantum Computing

发布日期: 2021年12月16日

To prepare this synthesis work we referred to the paper "Practical Secure Aggregation for Privacy-Preserving Machine Learning".?

In the previous episode, we have seen?that?federated learning?(FL)?is?a?machine learning setting where multiple?participants?collaborate to train a shared model without the need to?share?data.?However, research has shown that?maintaining?data localy?during the training process?does?not?guarantee complete?privacy.?

In fact,?one of the most prominent challenges, is?data reconstruction:?the?aggregator, who collects all?local models' parameters from the participants?in order to?aggregate them?and produce?the global model, may reveal extensive information about the private datasets of each participant. Thus, defeating the entire purpose of federated learning.?

Today, we will synthesize two techniques that can be used to overcome this limitation:

I. Secure aggregation,?

II. Differential?privacy.

So,?fasten your?belt!?

I. What is secure?aggregation??

Secure aggregation?(SA)?is?a?protocol that has been proposed to address?the previously mentioned problem by hindering the aggregator from analyzing the participants' individual model updates.?Current implementations of?SA in FL frameworks fall under?one of two main categories:??

1- Multi-party computation?

The first one is multi-party computation; the privacy of the locally trained models is?carried out?by applying techniques from cryptography. More specifically, the participants cooperate to generate random mask vector (a length n vector, where n is the number of the model parameters) and use it to obfuscate their own update before sending it to the server. The pairwise masks have a special property that once all masked models of all the clients are summed up at the aggregator (server), the pairwise masks cancel out,?subsequently, the server learns the aggregate of all models without revealing any individual client update.??

Figure 1.?Secure Aggregation / Source:?keynote?for the Secure Aggregation paper?

Challenges with multi-party computation?

The overhead of secure aggregation creates a substantial bottleneck in scaling secure federated learning.?Let’s?imagine thousands or even millions of clients in a network requiring a pairwise mask of length n to be generated and coordinated where the mask vectors are as big as the neural network model, therefore the overhead grows quadratically in the number of users. Fortunately, the Diffie-Hellman Key Exchange saves the day; Instead of sending gigantic vectors around the clients, this algorithm allows two parties to generate the same vector mask using only one single integer instead without revealing the mask to the server.?

In a federated learning setup, users can drop out at any moment due to several reasons such as low battery or poor connectivity. As a result, the pairwise mask generated for other users will not be canceled out and the output will not be recoverable since it is still masked. Hence, the secure aggregation protocol must be robust to operate in such environments where users can drop at any stage of the protocol execution.??

Figure 2.?Dropout risk in secure aggregation /Source:?keynote?for the Secure Aggregation paper?

The existing SA protocols

Over the years, several secure aggregation protocols have been proposed. Each one of them has its own strengths and weaknesses. For example,?SecAgg?protocol has a strong privacy guarantee and good dropout resilience (tolerates up to 1/3 user devices dropping out midway through the protocol), in contrast, it has significant computation and communication costs since each client generates shares for all the other clients participating in each round of the FL.?SecAgg+?is an improved version of its predecessor and the key difference is that SecAgg+ generates shares only for a fixed number of close neighbors rather than all the clients, thus, reducing the computation and communication costs.?HybridAlpha?is another protocol that employs differential privacy which is a technique will address in next sections, it also uses functional encryption that allows participants to encrypt their data, while computation of a specific function is still possible on the encrypted input so that the aggregator cannot learn anything about user’s data. We also mention?FastSecAgg?a protocol that introduced a novel secret sharing technique called FastShare which lowers the computation and communication costs while maintaining a strong privacy guarantee.?

领英推荐

4 Key Differences between Federated Learning and…

Naveen Joshi 2 年前

Introduction to Machine Learning Techniques

Blockchain Council 6 个月前

Role of Machine Learning in Cybersecurity

Meena - Cybersecurity Consultant,Trainer - CEH, CCNP, CCNA 1 个月前

2- Trusted Execution Environments?

The second approach to?perform secure aggregation is?by leveraging a hardware-based trusted execution environment?that ensures the security of sensitive data?shared by multiple parties.?In?the?case?of federated learning, the server?will?only?be?capable?of examining?the result of the computation, but no intermediate inputs to the computations performed in the?secure area of the main processor.?However, it is not straightforward and simple to use?such hardware due to the complicated and long configuration steps as well as the lack of documentation.?Trusted execution environments allow the creation of?only?small?fixed-sized secure memory?regions?to reduce the attack surface, consequently, it is not possible to simultaneously place all?weights from thousands of clients for aggregation.?

II. Differential privacy?

As we mentioned above, secure aggregation is a first step?towards ensuring data privacy by preventing the aggregator from seeing?individual client updates, however,?an adversary may be able to perform a number of attacks?such as membership inference attack to?learn if a record?was used in the training set.?Fortunately, this issue can be mitigated thanks to differential privacy?which is an approach to prevent information leakage.?More specifically, after each?model?training that happens?locally?at?client-level,?noise?is added to this model parameters?using a random perturbation algorithm?without affecting?its?overall pattern.?Still, such an approach?brings?a trade-off between privacy and?model performance. The more noise you add, the?less?accurate?the model?becomes,?especially for small datasets.?

1- How does it work??

Generally speaking,?we?can think of differential privacy as a small distortion of the?model updates.?Let’s?consider, for example, D = {x1,?x2, x3, …,?xn} a data?set?that?contains?n datapoints and D’ = {x1,?x2’, x3, …,?xn} a neighbouring data?set to?D?with a small distortion (a difference with only one datapoint). When we?apply?the same differential privacy algorithm on both data sets, we get similar distributions as illustrated in Figure 3.?

? Figure 3. An example of differential privacy?

More specifically, differential privacy introduces a privacy loss?parameter called?ε?to?the data. This parameter is responsible for controlling how much noise (or distortion) is added to the raw data?set.?

To make things more understandable, let’s consider a?column?that contains 0 or 1 as values.?For each value, we flip a coin. If it is heads, the value remains as it is. Else, we flip the coin again. If it is heads, we save the value as 1. if it is tails, we save it as 0. However, in real-life?applications, the noise adding process?is more complicated than a simple coin flip?and the randomness rate is controlled through?ε.?As ε grows, we gain in terms of model performance but lose privacy?and vice versa.?

In federated leaning,?differential privacy is fulfilled?either?by adding gaussian noise to the?global weights?(known as central differential privacy)?or at client-level where each client adds?gaussian?noise to?their local weights?(known as local differential privacy).?

2- Differential privacy tools in Python?

There are various python libraries that?support differential privacy. For instance, Diffprivlib is an?open-source library introduced by IBM.?Tensorflow?Privacy?is provided by Google. PyDP?is also provided by Google?and?contains a variety of?ε-differentially private algorithms. Moreover, Facebook?introduced?Opacus?to train?Pytorch?models with differential privacy.?

3- Challenges of differential privacy?

Differential privacy is applicable only on large data since model inaccuracy?can be tolerated to a certain extent.?This is not possible for small data.?
There is no fixed value of?ε (privacy loss?parameter).?If?ε=0,?we have perfect privacy, but the data is completely distorted and unusable. If ε?very large, there is no privacy at all. Consequently, it is hard to find the optimal value of ε that will?guaranty?the trade-off.?

Preserving the privacy of users’?data is the main purpose of federated learning, thus, secure aggregation should be a priority in all FL systems and frameworks to build?a?successful relationship of trust with the?participating?clients.?

Both?multiparty?computation,?trusted execution environments?and differential privacy?have their own limitations and challenges.?Nevertheless,?with?the?accelerated?adoption of federated learning by?multiple companies,?engineers?will?eventually?overcome these obstacles?and federated learning will be normalized as the default?setting for training ML models especially in the healthcare and finance sectors?where the privacy of the data matters the most.??

Next episode, we will reveal our?proposed Blockchain-orchestrated?FL architecture.??For once?in history, Joker and Batman?will be associates.?So,?Stick around!?:)

要查看或添加评论，请登录

Imen Ayari的更多文章

DeFL - towards decentralizing Federated Learning

2022年1月10日

DeFL - towards decentralizing Federated Learning

This work is the fruit of a collaboration between the members of Blockchain and AI teams of Talan Innovation Factory…
Federated Learning: Same mask but different faces

2021年12月7日

Federated Learning: Same mask but different faces

This work is the fruit of a collaboration between the members of Blockchain and AI teams of Talan Innovation Factory…

5 条评论
Federated Learning: Data protection-compliant AI

2021年12月2日

Federated Learning: Data protection-compliant AI

This work is the fruit of a collaboration between the members of Blockchain and AI teams of Talan Innovation Factory…

2 条评论
Blockchain : une solution pour les ransomwares

2021年2月22日

Blockchain : une solution pour les ransomwares

Les ransomwares Les ransomwares ou logiciels malveillants de ran?on sont un type de logiciel malveillant qui empêche…

1 条评论
Prêt à être industrialisé en masse, Enjoy

2020年11月19日

Prêt à être industrialisé en masse, Enjoy

Ce que j'aime le plus dans l'exercice de l'innovation c'est ce c?té très expérimental au début qui devient très…
Bank Al-Maghrib et Paris Europlace, en partenariat avec HPS et Talan, lancent, l’Africa Blockchain Challenge, destiné aux startups

2019年10月10日

Bank Al-Maghrib et Paris Europlace, en partenariat avec HPS et Talan, lancent, l’Africa Blockchain Challenge, destiné aux startups

Bank Al-Maghrib et Paris Europlace lancent, en partenariat avec les sociétés HPS et Talan, l’Africa Blockchain…

3 条评论
Decentralizing road safety

2018年2月9日

Decentralizing road safety

The aim of this "article" is to challenge readers to think about new use cases that can benefit from the added values…

1 条评论
What will be the incentive of miners on 2140?

2017年11月11日

What will be the incentive of miners on 2140?

I was discussing with a friend about what will be the incentive of miners when mining will stop been rewarded with…

4 条评论
Blockchain, gladiateur des temps modernes

2017年7月3日

Blockchain, gladiateur des temps modernes

Le principal frein à l'adoption en masse de la technologie Blockchain dans la majorité des domaines, notamment celui de…

49 条评论
I'm a catalyst, but definitely not a spectator

2016年8月8日

I'm a catalyst, but definitely not a spectator

I remember when I was in the high school, learning about chemistry and the chemical reactions, professors used to…

4 条评论

See all articles

Secure aggregation on top of Federated Learning: Same faces, blurred masks

Imen Ayari

Capabilities builder, catalyst of value creation; AI/Gen AI, Spatial Computing, Blockchain, Quantum Computing

I. What is secure?aggregation??

1- Multi-party computation?

领英推荐

2- Trusted Execution Environments?

II. Differential privacy?

1- How does it work??

2- Differential privacy tools in Python?

3- Challenges of differential privacy?

Imen Ayari的更多文章

社区洞察

其他会员也浏览了

Generative AI's Potential in the Creation of Synthetic Data

Harnessing Collective Intelligence: A Deep Dive into Federated Learning

Federated Learning with Rhino Health

Is 2024 the year of the Federated Learning ie "Year of Mobile ML"?

Federated learning and Vehicular IoT

Federated Learning is a Deep Learning Technology with poker chip Mission Potential.

Blockchain-Backed Federated Learning: A Breakthrough for AI Privacy

Artificial Intelligence vs Machine Learning 2024

Federated Learning

A Practical Guide to Federated Learning for Enterprise

I. What is secure?aggregation??

1- Multi-party computation?

领英推荐

2- Trusted Execution Environments?

II. Differential privacy?

1- How does it work??

2- Differential privacy tools in Python?

3- Challenges of differential privacy?

Imen Ayari的更多文章

DeFL - towards decentralizing Federated Learning

Federated Learning: Same mask but different faces

Federated Learning: Data protection-compliant AI

Blockchain : une solution pour les ransomwares

Prêt à être industrialisé en masse, Enjoy

Bank Al-Maghrib et Paris Europlace, en partenariat avec HPS et Talan, lancent, l’Africa Blockchain Challenge, destiné aux startups

Decentralizing road safety

What will be the incentive of miners on 2140?

Blockchain, gladiateur des temps modernes

I'm a catalyst, but definitely not a spectator

社区洞察

其他会员也浏览了

Generative AI's Potential in the Creation of Synthetic Data

Harnessing Collective Intelligence: A Deep Dive into Federated Learning

Federated Learning with Rhino Health

Is 2024 the year of the Federated Learning ie "Year of Mobile ML"?

Federated learning and Vehicular IoT

Federated Learning is a Deep Learning Technology with poker chip Mission Potential.

Blockchain-Backed Federated Learning: A Breakthrough for AI Privacy

Artificial Intelligence vs Machine Learning 2024

Federated Learning

A Practical Guide to Federated Learning for Enterprise