A privacy-preserving system based on Differential Privacy
Last night, I received a message from an old friend inquiring about my weekend plans. My keyboard magically recommended the exact words I was looking for as soon as I tapped the reply option. This is, of course, not surprising; machine learning and predictive algorithms have been present in smartphones for quite some time. You can find predictive algorithms in everyday activities such as replying to messages, sending emojis, and browsing websites. But it did make me wonder how I can maintain my privacy while smartphone companies collect my personal data. Differential Privacy, a practice used by Apple [1] and Google [2] to improve the user experience while maintaining user privacy, is one of the answers to what happens under the hood. Differential Privacy introduces noise into a dataset to obscure user privacy while preserving critical insights. It's like listening to a radio station. If a radio station's frequency is set to 92.7, we will still hear the radio frequency but with static noise, if we tune into 92.6 or 92.8. Let us delve deeper into the sea of differential privacy to understand it better and see how it can benefit us:
How Differential Privacy Works?
Simply put, Differential Privacy adds mathematical noise to data, which makes it difficult to pinpoint a particular individual from a large pool of data, as the outcome would appear the same, regardless of inclusion or exclusion of a specific individual. Assume Netflix measures the overall trend in searches for "Money Heist" in Malaysia. In that case, it can add or remove a particular region without affecting the general measurement for Malaysian searches.
Let's take another example - Suppose a team of 30 students is going camping, and before going, the event head wants to know how much money each student is carrying. However, the event head can ask the students to mention range (for example, in the zone of ±10). Any student having $50 in his pocket can say $40 or even $60, thus preserving privacy. But now the question arises: How does it ensure that results are accurate and not merely random numbers? This is where the 'Law of Large Numbers [3] comes - It states, "As a sample size grows, the mean gets closer to the average of the whole." The beauty of this law is that the added noise that anonymizes user data cancels out for a large sample size. Thus, on average, the event head can know how much money each student has while preserving individual privacy. In mathematical terms, a parallel database is created with the n-1 entry (compared to the original n). Here's a visual representation of Differential Privacy:
This figure illustrates two analysis datasets - D1 and D2. Dataset D2 consists of Dataset D1 and additional user data. If both the outputs are indistinguishable, the differential privacy condition is satisfied. Whoever sees the data won't tell whether or not the other user data is utilized. This Differential Privacy is determined using the Greek letter ? (Epsilon). It is called a privacy loss, and the maximum privacy loss is called the privacy budget. ? is derived from Laplace probability distribution [4] which computes the deviation of the data attributes from the original value. Hence, the smaller the deviation, the better the preservation. Similarly, higher epsilon values depict less private but more accurate results.
How Differential Privacy preserves User Privacy
Differential Privacy has several uses, namely:
1. Quantification -
Differential Privacy can be quantified using Laplace distribution. It allows for a comparison among different techniques and algorithms. Using ?, one can tweak accuracy or privacy based on the use.?
领英推荐
2. Group Privacy -
Differential Privacy ensures group privacy by analyzing privacy loss.?
3. Composition -
Using Differential Privacy, meaningful guarantees can be made about privacy even if there are multiple analysis results from the same dataset. This is called composition. It allows for the design and analysis of complex algorithms from simpler ones.?
4. Post-processing Closure -
Without additional knowledge, it would be difficult for anyone to reduce the ? value or make a dataset less private, thus ensuring post-processing immunity.
Some common uses of Differential Privacy by Big Tech
Conclusion
Although Differential Privacy is a new technique and research is still going on in this field, it does create some exciting opportunities in the world of data. Imagine an emergency call from a specific district X of Korea citing an asthma attack. Assume data analysts use differential privacy to mask user information and find patterns in emergency calls. They notice that all asthma calls come from the same district X. They can determine the underlying cause of asthma attacks in people living in the same area.
Similarly, from transportation networks to police stations to hospitals, data analysts can use differential privacy to secure user data while still detecting patterns. The options are limitless. Differential privacy, when managed correctly, can result in a harmonious relationship between actionable insights and user privacy.
Founder and Managing Partner | Comprehensive Solutions for Growth
2 年References: [1] Differential Privacy Overview, https://apple.co/3shaHCX [2] How Google Anonymizes Data, https://bit.ly/3q4ySC5 [3] Law of Large Numbers, https://bit.ly/3E7feKJ [4] Calibrating Noise to Sensitivity in Private Data Analysis, https://bit.ly/3q1n6IC