This Before That

This article is ostensibly about why the challenge space in an interactive zero knowledge proof has to be large. Understanding this rather obscure theoretical aspect of zero knowledge proofs is quite rewarding intellectually. I promise.

Let me start with a trivial question. How do you convince yourself that something happened before something else?

Here’re some possible answers.

  1. You literally see Event-X happen. You wait for a bit. You then literally see Event-Y happen. You know that X happened before Y because you saw it yourself. You are convinced because you trust your memory of what you have seen.
  2. Sometimes, things are physically structured such that Y cannot happen before X. Let’s say I see you wearing socks and shoes, you could not have worn shoes before wearing socks.
  3. Someone you trust tells you that Event-X happened before Event-Y and you believe them.

These are so intuitive that you don’t bother to reflect on this till someone specifically asks you this question. In fact, the 2nd example I gave above is used in many magic tricks – you expect a certain order because of obvious structure, but the magician circumvents that order to enthrall you with his magic. One such trick is where the magician pretends to cut an unpeeled banana with an imaginary knife, and then peels the banana to reveal a precise cut in the same location on the inner fruit. The magic works because it belies the natural order of events that you are used to.

Now that I have asked this seemingly trivial question on ordering, let’s formalize these three modes of what proves order: Witness, Structure, or Trusted Third Party.

  1. Witness: You personally witnessed something happen before something else.
  2. Structure: The structure of the two events is such that one cannot happen before the other. This could be due to physics, or the natural fixed dependency between two events, like birth and death.
  3. Trusted Third Party: A newspaper, a notary, an atomic clock which knows “real time”, etc.

Now that we have set the building blocks of our discussion, we can get into the main question. Let’s say Alice sees 2 events happen in succession. She wants to convince Bob that this is the case, but Bob wasn’t there with her at the same time. Alice is convinced because of mode #1: she was herself a witness to the order. Bob was not there at that time. So, mode #1 is ruled out. How does Alice convince Bob of order using mode #2 or mode #3?

Structure

If there is structure between these events, it’s self-evident to Bob that Alice is right. No real proof is required. But structure is not as straight forward as birth/death, socks/shoes, peeling/cutting a fruit, etc. Bob has to make sure that he verifies the structure thoroughly to see if Alice is playing any tricks. For example, in the pre-cut banana magic trick, Bob must look for a tiny pin-sized hole in the banana peel to see if Alice the great magician inserted a needle in the banana, moved it around to cut the fruit before beginning the trick. This would tell Bob that the banana was pre-prepared, and the structure assumption doesn’t hold anymore. Structure works, but only if Bob can convince himself that the structure was not tampered with by Alice. That there are no magic tricks or shoe-contortions that allows Alice to wear her socks after she has worn her shoes.

Trusted Third Party (TTP)

TTP’s work, but there’s just not enough of them around. We have newspapers, atomic clocks, notaries, and such, but they don’t cover every event that we are interested in. How does Alice get a TTP to notarize that event X happened before event Y?

Here’s one approach: Alice waits for a TTP to output a signed timestamp. The TTP doesn’t even know that Alice exists. A newspaper, for example, prints a copy on physical paper everyday. This counts as a signed timestamp. Alice then combines this signed timestamp and Event-X as two inputs to event Y. Bob sees event Y, and is convinced the output of Event-Y happened after Event-X. Bob is convinced because he sees that the timestamped newspaper is an input to Y, and that means that even X should have been available before Y. This is similar to how they show movie-kidnappers proving that their hostage is alive at certain time. They make a videotape of the hostage holding that day’s edition of a newspaper. In this example, the newspaper is the TTP; the hostage being alive is Event-X. The hostage holding the newspaper, being videotaped is equivalent to the timestamp and Event-X being given as inputs to Event-Y.

In fact, this approach is a slightly modified take on mode #2 (structure). Alice is using the newspaper in the video to convince Bob that the newspaper was printed first. Alice then made a video with the newspaper clearly visible in it, and these two events could have happened only in that order. The only two ways it could have happened in the opposite order are:

  • Alice predicted in advance what the newspaper frontpage would look like in the future. She then prints that newspaper herself, and uses it in the video. The newspaper is eventually published exactly like she had predicted, and Bob is now convinced. Alice is unlikely to be able to pull this off because she doesn’t know the future.
  • Alice is friends with the newspaper editor, and this editor will print a headline in the future that Alice has asked him to. That way, she can print her own copy of the newspaper with this headline, make the video and when the real newspaper is printed with this pre-determined headline, Bob can be fooled into thinking that the video was made after the newspaper was printed. But this scenario contradicts our assumption that the newspaper is a trusted third party, and cannot be manipulated by either Alice or Bob.

Interactive Proofs

With this background, we now get into interactive proofs. It so happens that interactive proofs rely on Alice and Bob’s shared acceptance that Event-X happened before Event-Y because Alice and Bob together orchestrate these events and are both witnesses to it. In the “Where is Waldo” example from our previous article on Zero Knowledge Proofs, we saw that Alice does Event-X first (places a piece of paper on the picture, called a “commitment”), Bob later comes in and gives Alice a challenge (either open the paper and show the image inside, or cut a hole in the paper to reveal just Waldo through the paper). Alice then does what Bob asks of her, and Bob can verify the response (called the “response”) They can repeat this sequence a few times for Bob to be convinced that Alice knows where Waldo is. Alice and Bob repeat this sequence of “commitment-challenge-response” (randomly changing the commitment and challenge each time) a few times and if it works each time, Bob knows that it’s overwhelmingly likely that Alice knows where Waldo is.

The catch is – only Bob can be convinced of this. If Bob goes to Carol and shows her this sequence of “commitment-challenge-response” triples, Carol doesn’t have to be convinced. Carol knows that Bob can create a similar sequence by doing the slightly different ordering of “challenge-commitment-response” (commitment and challenge are swapped) and trick her. In the Where is Waldo example, Bob makes up a valid commitment to both his possible challenges. Here’s how.

  • Challenge #1. Reveal that the picture under the paper is the original Where is Waldo image. If Bob wants this to be the challenge, the equivalent commitment is random. The paper can be anywhere on the image.
  • Challenge #2. Cut a hole in the paper and reveal Waldo. If Bob wants this to be the challenge, he make a new picture with just one image of Waldo in the middle, covers it with the paper, and then cuts it open to reveal Waldo.

In both these cases, Bob doesn’t know where Waldo is in the original picture. So, without actually knowing where Waldo is, Bob can write down a sequence of “Commitment-Challenge-Response”, by generating the challenge first, and not the commitment.

Why Order Matters

When Alice and Bob were doing the interactive proof, Alice could not trick Bob this way, because Bob saw in person that Alice did the commitment first, and then had to respond to Bob’s challenge without changing the commitment. Bob knows the ordering of “commitment-challenge-response” because he was there, and saw it happen. But he can never convince Carol of this because Carol also knows that Bob can create such a sequence himself without knowing where Waldo is, by just changing the order of “commitment-challenge-response” to “challenge-commitment-response”. Even if Bob is honest, he cannot convince Carol that Alice proved it to him that Alice knows where Waldo is. Carol can only be convinced if Alice and Carol do the interactive proof between just the two of them, and Carol can be witness to the ordering of “commitment-challenge-response”.

Challenge Space

The Where is Waldo proof doesn’t work as a general interactive proof because Bob cannot use it to convince Carol that Alice knows the secret. This is because the sequence “commitment-challenge-response” relies on mode #1 order proof (witness). What if the “commitment-challenge-response” sequence has some structure that proves that the commitment happened before the challenge? That could convince Carol that the sequence was generated in the right order, and not the cheating-order. In the Where is Waldo proof, there is no obvious way of imposing structure on the ordering between the commitment and the challenge.

In other interactive proofs (like the Schnorr protocol), Bob’s challenge is just a large number. Alice has to do some arithmetic operations with her previous commitment and Bob’s challenge number to come up with a response that satisfies Bob. In these kinds of proofs, it is possible that we could impose some structure between Alice’s commitment and Bob’s challenge such that it’s obvious to Carol that Bob could not have created the challenge before Alice made her commitment.

One obvious example of such a structure is a secure hash function like SHA256, where Bob’s challenge is the hash digest of Alice’s commitment. Carol knows that Bob cannot first come up with a challenge and then make up Alice’s commitment – because the hash function cannot be inverted. In such a proof system, Bob always makes up his challenge by hashing Alice’s commitment. So, the sequence is “commitment-hash(commitment)-response”. The challenge is always hash(commitment). If Bob now takes a series of such triples to Carol, Carol is sure that these triples were generated honestly by either Alice and Bob using an interactive proof, or Bob knows the secret and was able to prepare the proof himself. Either way, Carol can accept the proof. This is how digital signature schemes are constructed, where Bob can show Carol that Alice signed something, and Carol will believe that Bob is not lying (if Carol can independently confirm what Alice’s public key is).

So…..

We touched upon many concepts of Zero Knowledge Proofs in an informal way here. C-Simulatability in Sigma Protocols and the Fiat-Shamir heuristic, mostly. There are graduate level text books on these topics, and I have barely scratched the surface here.

Asymmetric power, reversed

What is power, really?

Power comes about when someone has the ability to destroy someone else’s accumulated capital.

What is capital, then?

Capital

Capital comes about as a result of raw materials, labour, and time. A healthy body, stored grains, a house to live in, a bank account with money earned through a job, or any sort of owned property – all of that is capital. Capital comes about as a combination of raw materials, labour, and time. Capital can either be consumed directly by its owner, or can be exchanged for other things like the labour of others.

If Bob has accumulated capital like this, and Alice has the ability to either destroy this capital, or make its value go down to zero, or coercively move the ownership rights of the capital from Bob to a new owner, I claim that Alice has power over Bob. In some primitive societies, Bob is never allowed to even accumulate capital because Alice controls Bob’s labour and time as well. The capital Bob creates is directly accumulated into Alice’s “account” short-circuiting Bob’s “account” entirely. This is sometimes called slavery.

The modern power dynamic is mostly as a result of Bob having accumulated capital to show for his investment of raw materials, labour, and time – all of which actually are physically not recoverable after the capital has been accumulated. The only thing Bob has to show for his raw materials, labour, and time is the accumulated capital. If Alice has the ability to take this capital away from Bob, Bob is rightfully afraid of Alice’s power.

The key thing to note is that the power is quick and efficient to wield. If Alice’s use of her power takes her as long as it took Bob to accumulate capital, there is no point in using power. Alice could as well generate her own capital, which is more efficient than going through the extra step of waiting for Bob to generate his capital and then confiscating it. Power is meaningful only when it’s quickly and efficiently wieldable. We can look at a few examples.

  • A gun is powerful because the shooter can quickly and efficiently end the life of the a victim. The victim, on the other hand, would have spent a lifetime of raw material, labour, and time to grow a healthy body.
  • Police have power because they can easily jail people, thereby nullifying the past and future labour/time of their “victims”.
  • Governments have power because they can take a share of their citizens’ accumulated capital in the form of taxes. Tax collection is inherently quicker and more efficient than generating capital.

There is an inherently asymmetric nature to power. Power is wielded quickly and efficiently on capital that is slow and inefficient to accumulate. It seems like this is almost natural. To counter this natural emergence of power in society, we have designed social mechanisms to keep these powers in check. Every once in a while, these mechanisms break, and we see humans resort to their more basic instincts.

Cryptography

Cryptography – through the magic of arithmetic and large numbers, reverses this very natural seeming asymmetric nature of power. Every primitive in cryptography reverses this power dynamic: Encryption, digital signatures, hash functions, zero knowledge proofs, multi-party-computation, you name it. Each of these lets Bob quickly and efficiently secure information, and makes it very difficult for Alice to undermine that security. It’s almost surreal to watch it in action.

Let me give a simple example from Elliptic Curve Digital Signature Algorithm, which secures ownership rights in Bitcoin. Bob randomly generates a private key, which is a very quick and efficient task for any computer, and somewhat quick and efficient even with paper, pencil, and a coin. The private key looks like this:

0xc2cdf0a8b0a83b35ace53f097b5e6e6a0a1f2d40535eff1cf434f52a43d59d8f

From this private key, Bob generates its corresponding public key, which is made of two numbers, and looks like this:

0x6fcc37ea5e9e09fec6c83e5fbd7a745e3eee81d16ebd861c9e66f55518c19798,

0x4e9f113c07f875691df8afc1029496fc4cb9509b39dcd38f251a83359cc8b4f7

The key thing to note is that the process of generating the public key from the private key is quick and efficient. In this case, a special well known number has to be repeatedly squared the private key number of times to generate the public key. Repeatedly squaring to calculate exponents is quick and efficient. Reversing this process, that is, taking the logarithm of the public key to recover the private key, seems to be very hard, and there are no known easy ways of doing it.

Another example is the famous RSA algorithm for encryption and digital signatures. Here, the private key is made up of two large prime numbers, and the public key is the multiplicative product of these two numbers. The two private key numbers look like:

p = 101565610013301240713207239558950144682174355406589305284428666903702505233009

q = 89468719188754548893545560595594841381237600305314352142924213312069293984003

The public key p*q = 9086945041514605868879747720094842530294507677354717409873592895614408619688608144774037743497197616416703125668941380866493349088794356554895149433555027

Given p and q, it’s quick and efficient to generate p*q, but given p*q, it’s not known how to quickly and efficiently calculate p and q.

As the numbers get larger, the degree of asymmetry between the creation of security and the breaking of it gets harder.

How does it all come together?

Just having some arithmetic be easy from one direction and hard from the other direction is not enough. These structures have to be useful to secure information. Wise cryptographers have come up with clever tricks to secure information using these arithmetic asymmetries. That these arithmetic asymmetries are opposite to the more naturally occurring power asymmetries in society is poetic justice of sorts. Mull over that for a second: the asymmetric nature of physical power where creation was expensive, but destruction was cheap – has been changed to a new order where creation is cheap, but destruction is expensive. All thanks to the quirks of how numbers work together. Just good old numbers.

To tie this to Bitcoin – we just have to understand that Bitcoin transforms capital to information. What used to be the best form of capital in the analog world (gold), is now digital. To prevent capture of analog capital (gold), owners used to build vaults and fortresses to make the oppressor’s power wielding inefficient. To prevent the capture of digital capital, all owners have to do is perform some arithmetic.

Zero Knowledge

“Zero Knowledge”, contrary to what it sounds like, is actually quite interesting and fun. It might even be a solution to our long standing problem of validating the world’s transactions without a trusted third party or government or central bank. If you Google for the terms Zero Knowledge and Blockchains, you will be flooded with whitepapers, articles, explainers, investment advice, and everything in between.

What does Zero Knowledge (ZK) even mean? Let me start with a toy example, and then we can work our way up to world peace.

Say we both get the same newspaper and it has a Sudoku puzzle in the games page. I claim to you that I know the solution to this puzzle, but will not tell you what the solution is. Being the frenemy that you are, you won’t believe me, obviously. Can I prove it to you, beyond reasonable doubt, that I do know the solution to this Sudoku puzzle, without telling you what the solution is? More formally,

  • If I know a solution, I should be able to convince you of that. Without leaking any knowledge about the solution.
  • If I lie about knowing the solution, I should be caught – with overwhelming probability.

If both the above are possible, that would be a Zero Knowledge Proof of Knowledge of a Sudoku puzzle. There are some ingenious ways of doing this, which rely heavily on cryptographic primitives and protocol design. In fact, it’s possible to convince an audience that you know the solution for almost any puzzle without giving them any hint of the solution itself. Sudoku was just one example. You could prove that you know the solution to a crossword puzzle, or the Rubick’s cube, or that you know a cycling route from New York to Seattle that’s exactly 5000km, or that you have paid your rent, or that your bank balance is more than $10,000 or any such statement really – without actually revealing the actual solution to the statement.

Imagine the power of such a system, where you could convince others that something is true, without revealing how it is true. In most real world systems, including financial systems, to prove something to someone, you have to reveal the actual facts of the matter – and thereby reveal more than you have to.

For example, getting a visa to any country requires you to provide your bank statement – just to prove that you can afford the trip. It should be possible to prove that you can afford the trip, without revealing any financial information. Also, the proof should be real – as in, if you cannot afford the trip, you shouldn’t be able to prove such a thing and fool the visa-issuing agency. We want both sides, the prover and the verifier, to win. Just with no leak of extra information. The best kind of privacy, if you will.

Where is Waldo?

First, I will give an example of how such a Zero Knowledge protocol looks like, to make you believe that it’s possible. Below is Waldo: Say Hi to him.

Waldo is somewhere in the amusement park image below. Can you find him? Don’t try too hard, it’s not worth it.

This “Where is Waldo” puzzle lends itself very well to a Zero Knowledge protocol. I can prove it to you that I know where Waldo is without revealing his actual location on the image. How do I do that? We run the following protocol between the two of us.

  1. You blindfold yourself. I keep a large white sheet of paper on top of the amusement park image, and ask you to remove your blindfold. You can give me one of two challenges. You should choose these challenges randomly.
    1. I should remove the sheet of paper and show you the amusement park image underneath.
    2. I should cut out a small hole in the white paper right above where Waldo is on the image. If I do this, I must know where Waldo is.
  2. Repeat step #1 till you are satisfied.

Why does this protocol work?

  1. If I know where Waldo is, I can easily answer challenge #2. That part is easy. It’s not so easy to figure out why challenge #1 is required.
  2. I could cheat by keeping some other image under the paper which has just many images of Waldo on it. How do you know that it’s actually the amusement park image and not some other image that I made up? Challenge #1 to the rescue. If you had asked me challenge #1, I had to remove the entire paper and show you that this was the amusement park image in question.
  3. Note that you cannot give me both the challenges at the same time, as that would tell you where Waldo is. Only one challenge per protocol round.

If we do this entire exercise just once, you could have asked me to answer challenge #2 and I could still cheat with a probability of 50%. If we do it twice successfully, I can still cheat with a probability of 25%. If we do it three times, it reduces to 12.5%. If we do it 10 times, and you picked your challenge randomly each time, I can cheat only with a probability of 0.1%. If we repeat this 20 times, the cheating probability drops to 0.0001%. And so forth, exponentially. Again, this only works if you pick your challenge randomly. If I know in advance that you will ask me the challenge sequence, of say, 122212121222111 – I can pass all challenges easily. The protocol works only if I am unable to guess your challenge sequence.

Cryptographic researchers have proven that almost any statement can be proven in zero knowledge. Imagine that! Any statement! It’s one of the most celebrated results in theoretical computer science, all the way back from 1986. The concept of Zero Knowledge Proof itself was introduced in 1985, after the original paper was rejected in major scientific conferences in the prior years because of how absurd the idea sounded. It still sounds counter-intuitive, if you ask me.

One popularly used ZK-proof system, solving a very specific problem, is that of Digital Signatures. When you digitally sign a document, you are proving to the verifier that you know a secret key to your public key (which the verifier already knows, or is tied to your identity, or some such). For the longest time, general purpose ZK-systems, which could prove any statement, were just theoretical results – the actual proofs themselves can be quite unwieldy and inefficient. Theoretical work continued, but there were still no practical applications that needed these proofs to get smaller, or easier to understand, or even remotely workable. 25 years went by, and people were mostly happy with either revealing everything about something to prove it, or having a trusted third party (like a Bank Officer or Notary) signing a statement saying that something is true, without revealing the underlying details. Ho-hum.

Enter Bitcoin!

Bitcoin removed the trusted third party from financial transactions. Or at least, introduced the idea that it could be done with clever cryptography and protocol design. Researchers who were toiling away in obscure labs and universities were suddenly like: “Hey, there are these amazing theoretical cryptography results from decades ago, let’s use them”. These ideas suddenly seemed ripe for more R&D to make them practical. And boy did the researchers and engineers deliver! Here’s a short list of how Zero Knowledge pervades the cryptocurrency space.

  1. New cryptocurrencies: Zcash, Monero, Grin, Beam, Mina, etc.
    • Everything about a transaction is hidden. Who is paying. Who is the recipient. What is the amount. Everything is hidden. Crucially though, verifiers can verify that the transaction is valid, and no one is cheating anyone. Zero knowledge magic. Details differ, but this is the general idea.
    • Additionally, Zero Knowledge proofs can verify large numbers of transactions without needing to store all those transactions. So, these ZK-blockchains can be as small as a few KB. For comparison the Bitcoin blockchain is 350GB and growing. Ethereum’s blockchain is 1TB or 5TB (depending on whom you ask) and growing.
  2. Layer-2: ZK-Sync, StarkNet, etc. bring the benefits of ZK-proofs to legacy blockchains like Ethereum and increase throughput quite dramatically.
  3. Other Proofs: Exchanges can use ZK-proofs to convince their users that they are not doing fractional reserve or rehypothecation shenanigans, and in fact, do custody all their customer assets.

What next?

Some of these general purpose ZK-systems have quite advanced cryptography, and their security guarantees are proven sometimes under ideal settings. When I say security guarantees, what I mean is:

  • Can the prover cheat?
  • Can the verifier learn something by violating the zero knowledge principle?
  • Can we do the entire thing without relying on cryptographic assumptions?
  • Some systems rely on an initial ceremony where some trusted party has to do one-off computation. Can we remove such requirements?

Practical minded people say that this stuff is too advanced, or “moon-math” as they call it. These primitives will not make it to Bitcoin for a LONG LONG time, if at all. Bitcoin’s cryptography is from an even older generation, and has been vetted in traditional settings like e-commerce, national defense, etc. No moon-math for Bitcoin!

That doesn’t mean that Bitcoin won’t benefit from these new developments. Bitcoin has evolved to a place now where the core protocol itself won’t change that easily, but additional features have to be built on top, in other layers. ZK-proofs will reside on a secondary layer somewhere on top.

Ethereum, on the other hand, is more open to these ideas. ZK-proofs are making their way into Ethereum’s core-system slowly, but will definitely pervade Ethereum’s Layer-2 ecosystem quite thoroughly in the near future. Much faster than in Bitcoin, from what I can see. Newer blockchains will go all-in, and will be built around ZK-ideas, or will offer them as native operators or subroutines.

You have the entire spectrum of blockchain platforms – some boringly conservative, and just trying to be sound money. Some others on the bleeding edge of maths, offering true privacy through ZK-proofs and the like. I expect these to become more mainstream as privacy becomes non-negotiable. Currencies, smart contract platforms, exchanges, and every other financial intermediary will go maths-first!