How to keep the servers blind?
Imagine the legendary late Charlie Munger, sitting in his office, wanting to research a new stock on his Bloomberg Terminal for the latest market analysis. As a seasoned expert, Charlie knows the value of keeping his investment strategies under wraps. He didn't want anyone, not even the powerful Bloomberg Terminal, to know which stocks he is interested in. So, how did he achieve this? Private Information Retrieval.
What is Private Information Retrieval (PIR)?
If a layman were to design a PIR system how would they do it? A trivial implementation of PIR is for the client to download the whole database from the servers and query it locally. Problem solved! Technical or not, you can tell that's not a very intelligent way to do it. Why? The download costs are too high. What if data changes frequently? Do I download it every day?
The trivial solution may not be practical, but that gives us an objective function to optimise. Download/ Upload costs or the network bandwidth usage between a client and a server. Now the trick is just to find intelligent ways to maintain privacy of query parameters while we reduce bandwidth usage. Are there ways where I don't need to download the whole dump and still be able to privately query to retrieve data of my interest? Can we do it at half the cost? Can we do it at lower? The answer to all of these is YES!
Intelligent ways of implementing PIR
To keep Charlie's investment strategies under wraps, researchers have developed sophisticated cryptographic protocols to implement PIR without the hefty upload/ download costs. Popular approaches include:
How far has this tech gotten?
The latest developments on PIR have astonished the cryptography world. Alexandra Henzinger and her teammates have established protocols called SimplePIR & DoublePIR that work with a single server (thus eradicating the need to trust multiple servers not to collude), reduces the client "hint" to be downloaded before querying a 1GB database upto 16MB, has limited network costs to <240KB per query, can achieve a throughput of 10GB/s/core which is pretty close to the theoretical limit considering memory bandwidth of a machine.
The reference paper also compares all previously proposed PIR schemes like SealPIR, OnionPIR, FastPIR, SpiralStream, XOR PIR on communication bandwidth, throughput and client hint size. Check it out.
Where can we apply PIR in the real world?
PIR has the potential to turn competitors into collaborators. Getting private access to confidential data with no risk of leaking querying parameters sounds like a pretty sweet deal.
A few examples where this could be helpful:
Most businesses have implemented ML algorithms to optimise their decision making. However, they have limited 1st Party training data and that limits how accurately a ML algorithm can perform. More often than not, complimentary datasets that can give you incremental goodness sit with your competitors that operate in the same/ adjacent space.
A cryptographic protocol that can turn Enemies into Frenemies, or even Friends? Do we need more validation that advancement of tech is better for the society?
#data #privacy #cryptography #PETs #cleanrooms