Explaining zero knowledge blockchains
How to show you know something without showing what you know
Last Friday saw the launch of Zcash, a new public blockchain and associated cryptocurrency that attracted a lot of attention. By now, there are hundreds of cryptocurrencies, so any budding young entrant needs a serious differentiator to rise above the fray. In the case of Zcash, this is easy – Zcash users can send money to each other in absolute privacy. For a cryptocurrency based on a blockchain, this is a remarkable technical achievement. (Though it should be noted that other chains such as Monero and Dash aim at the same goal using simpler but less effective means.)
As I’ve written about before, in a general sense blockchains (whether public or private) represent a trade-off in which disintermediation is gained at the cost of confidentiality. Blockchains provide a clever new way for participants to safely share a database, even if they do not trust each other, without requiring a central intermediary. But there’s a price to pay for this peer-to-peer decentralization – the “node” belonging to every participant in the chain must verify every transaction for itself, and this in turn means that it sees what everyone else is doing.
Two ways to chain
In the case of public blockchains and cryptocurrencies, the shared database serves primarily as a record of who controls (and so effectively owns) how much cryptocurrency, with an optional sprinkling of “metadata” (bitcoin) or contractual logic (Ethereum) on top. By contrast, in private blockchains, we tend to see two major classes of use case: (a) the ownership and transfer of external assets represented by tokens on the chain, and (b) more general applications relating to data storage and retrieval. For example, in our own product MultiChain, these two classes of use case are implemented using native assets and data streams respectively.
When it comes to general data storage, the blockchain provides a number of services: proving where a piece of data comes from, timestamping it, and notarizing it immutably to prevent modification by a minority of blockchain participants. But the blockchain need have nothing to say about the data itself – each application can decide what a piece of data means, and whether it is valid. Bad data can simply be ignored at the application level, without causing harm to the blockchain’s state as a whole.
By contrast, if blockchains are directly transferring tokenized assets, they must apply internal rules regarding the validity of those transfers. To put it simply, an event such as “Alice pays Bob one Euro” will only be approved by the chain if Alice has at least one Euro to her name. While different types of blockchain express this rule in different ways (bitcoin transaction constraints vs Ethereum smart contracts), they all share the property that Alice’s finances must be known by every node in the chain. This allows them to assess whether her payment is valid, know how much Bob has as a result, and evaluate any future payments from Bob to Charlie and others.
At this point, readers familiar with blockchains will point out that Alice and Bob are not directly identified by name on a chain. Instead, each transacts under one or more “addresses”, which are long alphanumeric strings of gibberish that bear no relation to their real-world identities. While this is true, in reality it does not help a great deal, because there are several ways in which the connection between users and their addresses can be inferred.
First and most simply, in order to transact with someone on a blockchain, I need to know at least one of their addresses. So if I send them some money, I can see where that money goes next, and if they’re paying me, I can see where it came from. Second, if I happen to know something about a participant from the real world (e.g. what types of assets they trade at what time of day), I can search the chain’s activity for corresponding patterns, and then infer their address with a high level of confidence. Finally, once I know one address of a participant, I can often work out which other addresses they own and use, by monitoring the full flow of funds on the chain. While this is not trivial to achieve, it is certainly possible with sufficient motivation, as proven by companies such as Chainalysis and Skry who make a living providing this type of “network analysis” for bitcoin.
Saved by encryption?
The contrast between assets and data touches directly on the question of encryption. In the case of general data storage on a blockchain, we can encrypt the information stored, while still gaining the benefits of data provenance, timestamping and immutability. None of these features need insight into the data itself. Therefore it is perfectly valid for two participants to use a blockchain to store information which only they can read, while still gaining the benefit of other participants committing to the origin of that data and its existence at a certain point in time.
By contrast, encryption of this nature cannot be used by transactions that represent transfers of tokenized assets. If Alice and Bob were to encrypt their transaction, then the assets in question could not be used safely by any other participant in the chain, because nobody else would know where the assets actually are. The assets would cease to have any collective meaning on the chain, which destroys the entire point.
In the finance sector, this conflict between privacy and liquidity is the core difficulty of using blockchains to transfer assets, dashing the hopes of many a startup in the space. While the technical feasibility of moving assets over a blockchain has been proven by countless pilot projects, in practice this causes too much activity to be revealed between peers. Information leakage is a disadvantage at the best of times, but it’s a complete showstopper when a chain’s participants are in fierce competition, or where regulation forbids it.
As a result, many prominent “distributed ledger” startups have moved away from the idea of on-chain settlement, reverting to more traditional bilateral transactions which are encrypted and notarized on a blockchain under the “general data storage” paradigm. This can prevent disputes and double spends, but settlement itself remains external to the chain. While the blockchain is still providing some value, it is less transformative than originally hoped. No doubt there have been more than a few red-faced meetings between startups and their investors.
And yet, after all the disappointment, salvation may finally be at hand. Enter the zero knowledge blockchain.
Introducing zero knowledge
Before discussing this new type of blockchain, it’s helpful to understand the principle of zero knowledge itself. In a general sense, a zero knowledge proof is one which demonstrates the truth of a certain statement, without revealing any additional information beyond what it’s trying to prove.
To take an example, let’s say I have a color blind friend who owns two pens, which are identical except that one is green and one is blue. My friend cannot distinguish between them, and I want to convince her that they are indeed different. Of course, I can’t do this by simply telling her the colors, because she can’t assess if I’m lying or not.
So what can I do? (Why not take a minute and try to work out the answer yourself…) Well, I can ask her to take a piece of paper, and draw two lines on it in another room. When doing this, she can freely decide whether to use the same pen for both lines, or one pen for each. From her perspective the result looks the same either way. Then she comes back in with the paper, and I tell her whether she used one pen or two. Of course, if the pens were the same color, I would have no way of knowing. So the fact that I get it right proves they are different.
Well, not quite. There’s a problem with this logic. Even if the pens were identical I would still have a 50% chance of giving the right answer, because there are only two possibilities (she used one pen or two). So one lucky guess proves nothing at all. In order to strengthen my case, the game must be played over multiple rounds. After every round, my chance of being consistently right goes down by half. So with 5 rounds, I have a 1 in 32 chance of successfully faking. With 10 rounds, it’s 1 in 1024, and with 20 rounds, 1 in 1048576 – in other words, one in a million. Depending on my friend’s relative level of boredom and suspicion, she can reach any probabilistic level of proof that she desires, although never absolute certainty.
Bring on the snarks
Zero knowledge proofs in blockchains apply a similar principle, though of course they’re not about the color of pens. Rather, they aim to prove the statement “this transfer of assets is valid”, without revealing anything important about the transfer itself. Zcash uses a relatively new technique for zero knowledge proofs called zk-SNARKs, the full explanation of which is (to put it mildly) beyond the scope of this piece. But the basic idea is this: any computational condition can be represented by an arithmetic circuit, which takes some data as input and gives an answer of “true” or “false” in response. A zk-SNARK uses a model of this circuit to let me prove, to any desired degree of certainty, that I possess an input which gives a true response, without revealing the input itself. Philosophically at least, this is like proving that two pens are different colors, without revealing what those colors are.
A zk-SNARK uses a neat little trick to avoid the interactivity that is typical of zero knowledge proofs, in which a skeptical party repeatedly presents a challenge to the one making a claim. In the case of our pens, this challenge is my friend’s choice between using one or two pens in each round. This type of interactivity is not feasible on a blockchain because there is no trusted central party to set the challenges. Instead, a zk-SNARK uses an approximation of a “random oracle” in which the challenges are created deterministically by some code, but behave for all intents and purposes as if they were random. Not by coincidence, this combination of determinism and unpredictability uses the same kind of hash function that secures a blockchain itself.
Zero knowledge proofs have been around for a while, but zk-SNARKs introduce a number of innovations that render them usable in blockchains. Most importantly, zk-SNARKs reduce the size of the proofs and the computational effort required to verify them. Zerocoin, a previous attempt at using zero knowledge proofs in blockchains, requires 45 kb transactions, each of which takes half a second to check (figures taken from the white paper on which Zcash is based). This is drastically worse than bitcoin, whose transactions are typically 0.3 kb in size and can be verified in under a millisecond. By contrast, Zcash transactions weigh in at 1kb and can be checked in under 6 milliseconds. This puts Zcash in the same scalability league as bitcoin – a remarkable achievement. If we took our hats off to the creator(s) of bitcoin, we should take our socks and shoes off for this.
Caution advised
Before you convert all your bitcoin to Zcash, there are some caveats to bear in mind. First, Zcash’s cryptography relies on a trusted setup process, in which two long public keys are derived from a single randomly-generated private one. It is absolutely vital that this private key is destroyed, since anyone who possesses it can forge the proofs on which the system relies. In the case of Zcash, the private key was created in an elaborate ceremony, described in detail here. The ceremony involved several well known characters from the cryptocurrency world, each of whom (we are told) had only a partial view of the private key. In turn, this means that Zcash can only be compromised if all of the ceremony’s participants colluded maliciously. It is up to the reader to decide how confident they feel about that.
Second, even though it is relatively quick to verify an anonymous Zcash transaction, creating each of these transactions carries a serious computational burden. According to the Zcash Speed Center, it currently takes 48 seconds on a high-end server, and over 3 GB of memory. This makes it impractical to transact anonymously from mobile devices and older desktops and laptops. Zcash partially works around this limitation by supporting both regular visible cryptocoins (with fast transactions) and anonymous “notes” (with slow ones), with a built-in method for converting between the two.
Third, even if we assume that the underlying cryptography is sound, there could be bugs lurking in the Zcash code which allow anonymous notes to be conjured out of thin air. This would allow the Zcash monetary base to be limitlessly inflated, ultimately rendering the cryptocurrency worthless. Unlike transparent cryptocurrencies like bitcoin, this catastrophic event cannot be detected, because the entire point of Zcash is keeping transactions hidden. Nonetheless, according to Zooko Wilcox, the Zcash CEO, work is already under way to find a solution, so we can look forward to seeing it.
Finally, as with any cryptocurrency based on proof-of-work, the potential for 51% attacks remains. This means that a group of “miners” with over half of the network’s computational power can collude to reverse transactions that everyone else thought were complete (bad miners still cannot fake transactions which steal others’ funds). Zcash smartly relies on Equihash, a different hashing algorithm from bitcoin’s SHA-256, meaning that the huge mass of existing bitcoin mining power cannot be turned against Zcash. Equihash is also designed to be more resistant to the “ASICs” (special purpose microprocessors) that have turned bitcoin mining into an oligopoly, but only time will tell if hardware engineers can find a workaround, and at what cost.
Zero knowledge private blockchains
So far, we’ve focused our discussion on the public Zcash blockchain and cryptocurrency. But what about external assets moving over private or permissioned blockchains and shared ledgers? Can the same zero knowledge techniques be used?
On a technical level, the answer is undoubtedly yes. Compared with the theoretical and technological tour de force that underlies Zcash, it’s trivial to extend the protocol to support assets issued on a chain. All that’s required is to extend the conditions proven by a zk-SNARK to enforce the preservation of multiple assets, instead of a single cryptocurrency. Or even more simply, create multiple distinct anonymous subsystems on a single blockchain, each representing a different type of asset, and transact within each subsystem exactly as Zcash does today. This second method would require no understanding of zk-SNARKs at all.
How would an asset’s life cycle look in this model? First, a trusted entity issues tokens representing the asset, by sending a visible blockchain transaction certifying those tokens’ value. The same entity would then perform a second transaction which converts the visible tokens into anonymized Zcash-style “notes”, effectively moving the asset underground. These notes can then be secretly transferred from the issuer to others, and onwards among the chain’s participants. As with Zcash, the transfer transactions can be verified as valid by all blockchain participants without revealing their content. Finally, when a holder wishes to redeem a note, they convert it back into visible tokens using another Zcash-style transaction, send those tokens to the original issuer, and receive the equivalent real-world asset in return. We might also allow notes to be directly redeemed anonymously, in which case blockchain participants would not know how much of the asset remains in circulation.
So zero knowledge transactions promise to untie the Gordian knot which has prevented blockchains from being used for settlement in the finance sector. To recap, in a regular blockchain transaction, when an asset is sent from one bank to another, the details of that transaction are visible to every other bank on the chain. By contrast, in a zero knowledge transaction, the others only know that a valid transaction has taken place, but nothing about the sender, recipient, asset class (if we’re clever) and quantity. Even the volume of transactions can be obfuscated by participants regularly creating fake transactions in which they send assets to themselves.
In terms of privacy, this is as good as a gold bar travelling in a briefcase from one bank to another, but without the cost and time of physically moving the gold. And it’s better than using a trusted intermediary such as a custodial bank, because there isn’t even that single party who sees everything going on. For the first time, zero knowledge blockchains allow asset transfers to be digitally performed on a peer-to-peer basis, in perfect secrecy.
Don’t throw out that database (yet)
Assuming that Zcash’s technical fundamentals are sound, I fully expect it to reach the top tier of cryptocurrencies in terms of developer interest and market capitalization. But is there a similarly bright future for zero knowledge transactions in private blockchains? Will they make the transition from the laboratory to production-quality systems moving real money around the world?
It is, of course, far too early to tell. But there are a number of questions that need answering before permissioned blockchain advocates can point to zero knowledge transactions and triumphantly declare victory.
First, and most importantly, is this safe? Can we really be confident that both the underlying cryptography and its coded implementations are strong enough to prevent a malicious party from generating assets out of thin air? As mentioned earlier, unlike transparent blockchains, it is not yet possible to detect if the monetary base of a zero knowledge blockchain has been compromised. Still, there is no surer test of this technology than releasing it as an open public blockchain that is available for all to see and attack, and this is exactly what Zcash is doing. After several years of seeing Zcash running smoothly, institutions may become convinced that zero knowledge blockchains can genuinely safeguard their assets. As with all matters blockchain, patience is required.
A related issue is the novelty of zero knowledge cryptography itself. It’s true that regular blockchains rely on advanced cryptography – namely, asymmetric encryption (public/private keys) and cryptographic hash functions (digital fingerprints). And it’s also true that the great majority of blockchain programmers and application developers don’t understand the mathematical principles which underlie these techniques. But the broader point is this: if treated as black boxes, these methods have been widely employed for decades, by a huge number of developers and users (heard of https?) and everyone believes that they work. By contrast, until recently zero knowledge proofs were only known to a small community of academics, and didn’t have broad applications on the Internet or elsewhere. We can expect this obscurity to reduce the willingness of a bank’s CIO or risk officer to move their core processes to zero knowledge blockchains, at least for the next five years. And let’s not even start to imagine how long it will take regulators to get comfortable with assets moving around in this way.
Talking about regulation brings up another practical issue with zero knowledge blockchains. Anonymous transactions in a blockchain contain statements regarding asset transfers and ownership, but those statements are only visible to selected parties (namely, those directly involved). Even if we give a regulator full visibility into a zero knowledge blockchain and its participants’ identities, it has no way of knowing what is truly happening within. Of course, the regulator could ask all of the participants to identify and reveal their transactions, and they can do this efficiently using Zcash-style “viewing keys”. Nonetheless, if the parties to any particular transaction want to keep it secret, the regulator is stuck, and does not know who to fine. There is no custodial bank from whom it can obtain the full picture, and the only option for enforcement is to shut down the entire chain.
So what’s the bottom line? For now at least, I suggest simply following the progress of the public Zcash blockchain, to see how it develops and grows. If the history of Ethereum is repeated, there will be surprises and vulnerabilities lurking under the surface, waiting to be exploited by greedy opportunists. Nonetheless, in the longer term, make no mistake: zero knowledge transactions are a game-changing breakthrough for blockchains. If the underlying cryptographic principles prove sound, expect them to significantly broaden the range of use cases to which blockchains can be applied.
Assistant Professor at South University of Science and Technology of China
7 年Great article. But still no idea of zk-SNARK.
CEO & Founder of Ledgerscope
7 年A very well written article, but the first thing I read on the Zcash homepage is "The Zcash client is now available for download as a command-line tool for Linux." That's not going to encourage user adoption is it?
House, Techno, Ruby, Javascript, Angular (Contractor)
7 年Fantastic read <> super interesting thank you :) "Nonetheless, in the longer term, make no mistake: zero knowledge transactions are a game-changing breakthrough for blockchains." Fascinating to see how it will all unravel!
Markets, BNP Paribas / Blockchain & Digital Currency Advocate
7 年Needed that!! Thanks Gideon!