How to keep the servers blind?
Credits: Dall-E | Private Information Retrieval

How to keep the servers blind?

Imagine the legendary late Charlie Munger, sitting in his office, wanting to research a new stock on his Bloomberg Terminal for the latest market analysis. As a seasoned expert, Charlie knows the value of keeping his investment strategies under wraps. He didn't want anyone, not even the powerful Bloomberg Terminal, to know which stocks he is interested in. So, how did he achieve this? Private Information Retrieval.


What is Private Information Retrieval (PIR)?

If a layman were to design a PIR system how would they do it? A trivial implementation of PIR is for the client to download the whole database from the servers and query it locally. Problem solved! Technical or not, you can tell that's not a very intelligent way to do it. Why? The download costs are too high. What if data changes frequently? Do I download it every day?

The trivial solution may not be practical, but that gives us an objective function to optimise. Download/ Upload costs or the network bandwidth usage between a client and a server. Now the trick is just to find intelligent ways to maintain privacy of query parameters while we reduce bandwidth usage. Are there ways where I don't need to download the whole dump and still be able to privately query to retrieve data of my interest? Can we do it at half the cost? Can we do it at lower? The answer to all of these is YES!


Intelligent ways of implementing PIR

To keep Charlie's investment strategies under wraps, researchers have developed sophisticated cryptographic protocols to implement PIR without the hefty upload/ download costs. Popular approaches include:

  1. Using homomorphic encryption, a technique that allows computations to be performed on encrypted data without needing to decrypt it first. This means Charlie can send encrypted queries to the Bloomberg Terminal, which processes these queries without knowing the contents, and then sends back the encrypted results for Charlie to decrypt locally.
  2. Another method is secret sharing, where the query is split into parts and sent to multiple servers. Each server processes its part of the query without knowing the whole picture. When Charlie receives the parts back, he combines them to get his answer. This way, no single server has enough information to determine what Charlie is looking for. It still remains open to the threat of these distributed servers colluding to infer what you retrieved.
  3. Another method is garbled circuits, where the server doesn't store the data in its plain form. Instead, for each data element, the server creates two garbled values. These garbled values are essentially random encodings that cloak the actual data. One encoding corresponds to the real data element, while the other acts as a dummy value. Crucially, the server itself doesn't know which garbled value represents the real data. The client uses a special key to create a garbled query that interacts with the garbled circuit. The server processes these queries within the circuit, evaluating both the real and dummy values without knowing which is which. The client then uses its key to decrypt the results, filtering out the dummy values and revealing only the actual data that matches the client's query.


How far has this tech gotten?

The latest developments on PIR have astonished the cryptography world. Alexandra Henzinger and her teammates have established protocols called SimplePIR & DoublePIR that work with a single server (thus eradicating the need to trust multiple servers not to collude), reduces the client "hint" to be downloaded before querying a 1GB database upto 16MB, has limited network costs to <240KB per query, can achieve a throughput of 10GB/s/core which is pretty close to the theoretical limit considering memory bandwidth of a machine.

The reference paper also compares all previously proposed PIR schemes like SealPIR, OnionPIR, FastPIR, SpiralStream, XOR PIR on communication bandwidth, throughput and client hint size. Check it out.


Where can we apply PIR in the real world?

PIR has the potential to turn competitors into collaborators. Getting private access to confidential data with no risk of leaking querying parameters sounds like a pretty sweet deal.

A few examples where this could be helpful:

  1. Cybersecurity: Tech companies querying another company's threat intelligence data to identify potential vulnerabilities without revealing their own security gaps.
  2. Finance: Investment firms querying transaction databases to analyse market trends without exposing their specific interests or strategies.
  3. Journalism: Journalists querying a database for sensitive political information without revealing the subject of their investigation, ensuring the safety and confidentiality of their sources.

Most businesses have implemented ML algorithms to optimise their decision making. However, they have limited 1st Party training data and that limits how accurately a ML algorithm can perform. More often than not, complimentary datasets that can give you incremental goodness sit with your competitors that operate in the same/ adjacent space.

A cryptographic protocol that can turn Enemies into Frenemies, or even Friends? Do we need more validation that advancement of tech is better for the society?

#data #privacy #cryptography #PETs #cleanrooms

要查看或添加评论,请登录

社区洞察

其他会员也浏览了