Thoughts on Apple's 'differential privacy'
I just read this article https://www.wired.com/2016/06/apples-differential-privacy-collecting-data/ which is interesting from many respects. I find that the main argument Apple is using to prove the world that their approach to privacy is 'clean' and you should not be scared of it, is that they are using mathematical tools that guarantee anonymity (https://www.cis.upenn.edu/~aaroth/Papers/privacybook.pdf). This peaked my curiosity a little bit as this sounded to me like the old trick of citing work from PhDs, Nobel Prizes, reputable people, to gain all credibility for what is said without question.
I have to admit first that I am not an expert in machine learning, so feel free to tell me that I am wrong with the rest of this post.
So, they pump data, anonymize it and then run algorithms that look for patterns; typical unsupervised process. But as they say, they want to derive the best experience to you and me from this. So this means that they are also doing supervised learning. And most likely, refine their algorithms until they arrive at some level of precision they want to ... to improve the behavior of their services. I am also sure that they take into consideration a lot of attributes so that they can provide me the best contextual and personal experience. Finally, if they iterate a lot - which I believe they do -, they will naturally start to isolate groups of people with similar attributes. Now, what kind of attributes are we talking about? Certainly digital data (such as navigations, search terms, mail, chat sessions, apps, etc.) and also physical data (location, voice, phone number prefixes, payments, etc.). Whereas I kinda believe intuitively that digital data can be truly anonymised, I have serious doubts on physical data. Therefore, I am tempted to conclude that at first glance their approach is truly anonymised but after several iterations as they segregate data based on physical information which are less anonymous by nature, they will naturally get sets of identifiable data.
How does this matter? Well, I can be apocalyptically pessimistic or simply grumpy with some annoying side effects such as ads. But it's easy to imagine that they can for example infer that on a regular basis I am delivered food from a specific grocery store. And if this information comes in the hands of a malicious person, you can imagine all sorts of bad stories.
To conclude, I cannot believe Federighi's "while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.” Sure, they are doing very interesting work and improving their services, but data is not completely private.