Upgrade 2021: CIS LAB Speakers

September 21, 2021 // Upgrade 2021: CIS LAB Speakers

Public-Coin Time and Space-Efficient Arguments from Groups of Unknown Order

Justin Holmgren, Scientist | NTT Research Cryptography & Information Security

This is joint work with Alex Block, Alon Rosen, Ron Rothblum, Pratik Soni. I’ll just jump right in. So today I’m going to be talking about interactive arguments, which are protocols where a computationally bounded prover is trying to convince a verifier of some claim about a computation. And so in the tradition of complexity theory, we model this as some string x being in a language L. And we have these two parties: a prover, which, you know, P and a verifier V. They exchange short messages with each other. And at the end the verifier outputs either accept or reject. And the two basic properties of any proof system are completeness, which is that the verifier should accept true claims and no dishonest, potentially cheating, prover should be able to make the verifier accept false claims.

So today my focus is going to be on three desirable properties of argument schemes. So the first one is efficiency. So, you want your, your protocol to be as efficient as possible. If your underlying computation takes a time T and space S to verify on your own, then a prover, for convincing a verifier, should also take around time T and space S. A second requirement is transparency, which means that the verifier should not rely on any secret randomness to actually decide whether to output accept or reject. And by secret randomness I mean there shouldn’t be any, any randomness that if an adversary got its hands on, it would allow the adversary to convince the verifier of false statements. And finally, as a theoretical cryptographer, I always care about the computational assumptions that a construction is based on.

And so good assumptions mean different things to different people, but for today, it’ll just mean falsifiable and well-studied computational assumptions. So let’s take a look at what was known. So there are several, there are basically three main approaches to constructing argument schemes. One is to start with something called a SNARK and do some recursive composition and bootstrapping, and this gives you the first two properties, but in terms of computational assumptions, it’s, it’s not so great. It either relies on these, kind of complicated, or these exotic knowledge assumptions, or a random oracle heuristic. And it uses a random oracle in a like less benign way than many other random oracle constructions. It’s really using that in a non-black box way, which is a little more suspicious in terms of security.

Another approach, which is the most closely related approach to the one that I’m talking about today is to build arguments out of something called an interactive oracle proof. So this can be done based on any, on, like, collision resistant hash functions. But today the best known constructions in this line of work have required the prover to run in, to use space that’s proportional to the time of the computation that it’s trying to prove, which is, which can quickly turn into a bottleneck. That’s usually where most papers on like benchmarking proof systems stopped stop their graph, is when they run out of memory. So, the last approach that I just want to quickly mention, is based on take, starting with multi-prover interactive proofs and combining those into arguments. And these are based on fully homomorphic encryption and they require the verifier to use secret randomness. So, as I’ve been alluding to, we’re going to try to get all three of these things in this work to get a transparent argument system with good time and space efficiency, and it’s based on a very well-studied hardness of factoring assumptions.

There’s a little asterisk next to this, and I’ll get to that in a little bit when I discuss our results in more detail. But yeah, the most closely related line of work is the one based on IOPs. But in a little bit more detail: what we, so let me just sort of give a theorem statement of sorts. So, the assumption is that it’s hard to factor products of safe primes. And if you have this assumption then for any language that’s, that can be verified in time T and space S, there’s an argument for L where the prover runs in time nearly T and space nearly S. The verifier runs in near linear time. And the verifier is public coin, except for, like, a reusable first message. Or you can think of this as a common reference string, if you want that requires some secret randomness.

Oh, sorry. I guess I was behind a little bit on the slides. So, however, if you make a sort of a different, but factoring-related assumption, like a hidden order assumption in ideal class groups, then you can get a scheme where even the first message is totally transparent. So these are the assumptions from now on, but you don’t actually need to know what these assumptions are. We’ll just, like, reference work that builds on these. So, like I said, the most similar line of work is building arguments based on interactive oracle proofs. So I’ll just quickly recap what those are. So these are a slightly different model. You still have a prover and verifier, but now the prover is sending long proof strings that the verifier doesn’t even have the time to look at all of.

The verifier just makes a small number of queries to this proof string. Maybe sends some kind of random message to the prover, they repeat. And the nice thing about this model is that there is a way to compile this sort of information-theoretic primitive into an interactive argument where all the messages are succinct. So instead of the prover sending this long proof string across, it’ll send a short commitment. And the verifier will tell the prover what positions of the long string it was interested in. And then the prover will send some values back and prove that they are consistent with the commitment, and repeat. So, one of our observations is that interactive oracle proofs are kind of the wrong abstraction for the type of efficiency we want. There are no known IOPs where you can actually generate these prover messages in time T and space S. However, there are IOPs where these messages have some structures so that you can commit and open to the messages in time T and space S. And so committing and opening is all the argument proofing needs to do. So that gives you the argument scheme set that we want.

So just to overview our techniques. So we leveraged some structure of these IOP messages. Specifically, we leveraged that they’re sort of tense, have this tensor structure, or, if you want, you can think of them as being multi-variate polynomials. And so for these polynomials that have a more succinct description, you can think of the list of coefficients, or the evaluations on a subset of the domain or something. And we can compute this sort of succinct description in time T and space S, it turns out. This was, like, one of the, a very messy thing to write in one of our previous papers. And the second part is constructing efficient commitment schemes for these kinds of messages.

So what we’ll do is we’ll start with this proposal by Bunz et al, which turns out to be buggy. We give a quite non-trivial fix and generalization. And we, when we notified the authors of our work, we found out that they had discovered the bug independently. So along the way, we introduced some new techniques for a batching proof of knowledge statements. And so these two things are what we’re going to focus on, not so much the IOP properties. So, like I said, we’re just going to say something about polynomials, we’ll focus on these more general tensor commitments. So what’s that? Well, first it’s the commitment part, which you can think of as just some collision resistant hash function. And then on top of that, there’s a proof system. And the proof system lets you say things like: I know a message that’s consistent with this commitment.

And, like, here’s a linear function applied to the message. And like, I can say what the value of that linear function is. And here we’ll just have restricted forms of linear functions, that actually have succinct descriptions. This is this tensor structure of, of this vector U. And from this primitive you can, you can recover a polynomial commitment by just picking specific values of these vectors u_i. And so what we get in terms of these tensor commitments is that under our factoring assumptions, or either one, some variation, then you get a good tensor commitment. And I, I really don’t want to focus too much on, on the efficiency aspect, even though that, in some sense, is a, the point of the work. Because after, after, like, we sort of give the candidate proof system, it’s going to be very, it’s going to be relatively easy to check the efficiency. And the hard part is actually proving that the proof system is sound.

So the idea is to, to start with some homomorphic hash function of sorts, which turns out to be, like, the only place where you use any computational assumptions. And then we’ll describe a proof system for that hash function. So what is this homomorphic collision resistant hash function? So, it is a hash function. It takes as input a sequence of integers and outputs, some, a group element. What we know how to construct is not something that’s collision resistant on the entire domain of all the sequences of integers, but it’s collision resistant on short sequences. So sequences where every entry is bounded.

The homomorphic properties that we’ll need are actually twofold. So if you were given two evaluations of this hash function: H(X) and H(Y). We’ll want to be able to compute both H(X + Y), and that’s just going to be addition in the target group. And we’ll also want to be able to compute H of the concatenation of our two strings X and Y. And just denote the, whatever. This is notation, this circle inside a circle for the homomorphic application of this concatenation. So this is the, constructing such a hash function is, is where we need the assumptions. We’re, like, if we have, if we assume that factoring safe primes is hard, then we can get a hash function with a sort of secret coin set up. And this more complicated assumption about class groups gives us a hash function with a public coin setup.

Great, so. Just, do these two homomorphic properties make sense to everybody? Okay. So this is, we’re going to just let H be a hash function like this for the rest of the talk. So, well what we want to prove, what we would want to prove is a statement like this. Where the prover knows a vector and, that satisfies this hashing constraint and this linear inner product constraint. So, there’s, like, basically one proof system that has popped up in a million different flavors across cryptography. And it’s Sumcheck.

And so, it has like two, two basic steps. First, we’ll start with this big claim that we’re trying to prove, and we’ll split it into some number of smaller claims. So in this case, we can think of, about the vector that we’re trying to prove knowledge of, as having some sub-parts and the prover sends a commitment to those sub-parts and it sends, like, an inner product of, of those sub parts with a different vector. And because of the homomorphic properties of the hash function, the verifier can just check that the two commitments are consistent. And because of the tensor structure of this, of this vector that we’re taking the inner product with a verifier can check that the two, the claimed inner products are consistent.

So step one is splitting into a smaller, into multiple smaller claims. And then step two, the thing that we try to do is reduce the number of claims. We have several claims, now I want to reduce to one. And so just to like, sort of say how, say like, illustrate how this goes, fits together. So you start with this one very big claim and you sort of, like, you do this protocol for, where the prover sends something and it splits it into many claims, and then do this and you just repeat. And at the, after a logarithmic number of rounds, you’ll end up with just one very small claim that you can, the prover can prove directly. And each round is sort of low overhead. So the whole protocol is low overhead. The problem is that we actually don’t know how to do this, this step. Where you reduce from two to one claim.

This is, like, very common in, in sort of Sumcheck proofs, but in proofs of knowledge, it’s, it’s tricky or at least in this setting of the proof of knowledge it’s trickier. Okay. So, we’ll do something a little bit related, but first I want to abstract out some of this mess and just think of this mapping where we take, this vector V maps to this, the hash function and the inner product, as just an abstract homomorphism. So between two groups. Right. So we don’t know a two to one reduction, but I think there’s probably, like, a sort of canonical candidate proof system that, that if you’re familiar with Sumcheck you would, you would first try and I want to walk through first, why that doesn’t quite work.

So you have your arbitrary homomorphism. The prover is claiming that it knows these two pre-images of the homomorphism. And so what the verifier might try to do is to take a random linear combination of these claims. So it just samples some random vector, let’s say from some domain and it just takes, so that, that specifies a linear combination and it asks the prover to prove that it knows a pre-image for this linear combination. The problem is that this could this, I mean, this, this just doesn’t work. The prover might know a pre-image of, of for example, like two times the Y one. And then if say R1 happens to be even, then the prover just got lucky and it’s able to, able to convince the verifier.

So this paper that we’re building on, the Bunz et al paper. They actually do prove a sort of computational soundness for a particular group and a particular H. But they, they, like, sort of fail to prove knowledge of a short pre-image. And it’s actually vital for the whole scheme to work, because this collision resistant hash function is only collision resistant for short inputs.

>> Audience Member: A quick question.

>> Yeah.

>> Audience Member: What is the definition of short?

>> So it means that each integer is bounded by some, something. Like in the, so the, yeah.

>> Audience Member: Okay. So there could be many entries, they just don’t.

>> Yeah, yeah, I guess L infinity norm makes sense. Okay. So we, we don’t know how to do this two to one reduction, but we can do a 2K to K reduction, sort of. So instead of, like, keeping around like two claims, one claim, et cetera, we’ll just, sort of, scale things up a little bit. And this, sort of, like the, the high-level picture will still look the same, just, sort of, the splitting into large number of claims in the beginning, and then just go back and forth, like reducing splitting. And there’s just some small, additional, like, poly K overhead from this, but as a theorist, I don’t, K is like a statistical security parameter, poly K overhead is, is fine with me. Okay. So this 2K to K reduction I think is pretty fun. So if you’ve, if you haven’t been paying attention, then I would advise paying attention now. So, so the initial claim is, again that the, the prover knows 2K different pre-images of, of this, of this, like homomorphism F.

And so, the way that we will reduce is, I mean, it’s actually, like, kind of what you would expect. You take, you take, you sort of take many linear combinations now, instead of just one. It’s actually going to be important that we’re taking these zero/one linear combinations, at least for our proof of security. And now, the claim that we’re reducing to is knowledge of a short X-prime, such that X-prime is a pre-image of this, this new set of linear combinations. Yeah, so. To prove soundness, what we want to show is that if you have, if you have a, a prover which convinces you, in this, in this protocol. So, then given a constant number of accepting transcripts of this prover, you should be able to extract this long vector X. So each, each individual transcript just has a small amount of communication, but by taking many of these transcripts, you should be able to extract the longer thing.

So to, in order to, like, actually prove that you can extract from this, and you actually have to prove this like weird, pure math dilemma. Which is that if you take this random matrix, then it actually has an integral left inverse, integer left inverse. And the, so. This is, actually turned out to be a sort of complicated for us to prove, but I want to give a quick taste of it. So we actually, so we consider a sort of sequence of lattices, and we, sort of, have that this matrix will be left invertible over the integers, if the lattice spanned by its rows is all of Z to the K. So we considered this, like, random sequence of lattices as we’re just adding random vectors to it. And we, we show that this sequence of lattices will rapidly approach and actually become equal to this, all of, all of Z to the K.

And the way that we, that we, we prove that, surprisingly enough, turns out to be by looking at the determinant of the lattice. And we need to show that that goes to one, that’s sufficient to show this, this claim. And to do that, we have, we look at the prime factorization of the determinant, and we observe that, like, each, basically each prime power has, like, a probability one half chance of being killed, when you add out a new random vector, new random zero/one factor to the lattice. And so once you have that, you can just sort of prove, that after, like, about, I mean, some constant times K number of additions and we’ll have the entire, all of the, Z to the K. Okay, so this is a lemma, but you don’t have to, I mean, it’s just very clean and abstract. Once we have this lemma, we can.

Oh, sorry, slide is messed up, oh. I can probably fix it in like 10 seconds. Alright, so, yeah, let’s recall, this is a sort of reduction protocol. Where you start with 2K claims knowledge and reducing to K, with this multiplication by a random matrix. So like one kind of very silly attempt you could do is just sort of hope that this matrix A is, has a left inverse. And that that’s like completely silly, because the dimensions are just wrong.

Like, in order for a matrix to have a left inverse, it has to be kind of tall and skinny, and this matrix, because we’re trying to reduce the number of claims, is short and fat. So, but if you, if you did have a left inverse, you could just, you could just compute X from the X-prime that you actually see in the transcript. So we don’t have, I mean like, this. So instead of, instead of just hoping that this matrix with the wrong dimensions will have left a left inverse, we use our lemma. So we rewind the prover. We got some constant number of accepting transcripts. And if you actually stack these transcripts, it will be like you have now a, just a tall, skinny matrix, and it will be like you ran the protocol with a tall skinny matrix.

And now it actually is reasonable to hope that, that A has an integer left inverse. And then when you can, sort of, make this thing go through And, yeah, so that’s what our lemma says.

Okay. So anyway, just to sort of conclude. We constructed first, like, transparent arguments for NP, where they were time and space efficient, and based on well-studied falsifiable cryptographic assumptions. We found and fixed this bug and it’s pretty non-trivial to fix. And we sort of developed these tools for working with unknown-order groups that will likely be more applicable. Sort of also in our paper was this, as a sort of a side contribution, just an improvement to this popular proof of exponentiation protocol. But that’s, for another time, I guess. So thanks.

Justin Holmgren

Scientist | NTT Research Cryptography & Information Security

Justin Holmgren is a Scientist at NTT Research, studying the foundational theory of cryptography and its interplay with diverse areas of computer science. His work has advanced the feasibility of securely outsourcing computation, private information retrieval, and software watermarking. He earned his PhD (2018), MEng (2015), and SB degrees (2013) in Computer Science from MIT, as well as an SB degree (2013) in Mathematics. Before joining NTT Research, Dr. Holmgren was a postdoctoral researcher at Princeton University and at the Simons Institute at UC Berkeley.