Stefano Tessaro: The t-wise Indepen...

Upgrade 2021: CIS LAB Speakers

September 21, 2021 // Upgrade 2021: CIS LAB Speakers

The t-wise Independence of Substitution Permutation Networks

Stefano Tessaro: Associate Professor | University of Washington

Transcript of the presentation The t-wise Independence of Substitution Permutation Networks, given at the NTT Upgrade 2021 Research Summit, September 21, 2021.

I’m an associate professor at the University of Washington but want to point out I was also a visiting researcher here in this lab over the summer. I kind of enjoyed my visit very much and the hospitality in this very same building.

So I would like to tell you today about some very recent joint work with Tianren Liu, who is a postdoc at the University of Washington, and Vinod Vaikuntanathan at MIT. And this is where the title is a little bit technical, but this is really work that at its core is asking what I consider to be one of the most fundamental questions in understanding how cryptographic security is bootstrapped. And so at its core, what we are interested in here is a classical question about the security of block cipher algorithms like the Advanced Encryption Standard, or AES for short.

Now, AES just for context, I guess everyone here knows but I don’t who else is in the audience at home or remote, it’s really the most widely used cryptographic algorithm in practice and I mean that by just the sheer number of CPU operations you’re spending on it and running it in the real world. So the fact that it’s a block cipher means that it’s an encryption algorithm which is meant to encrypt short blocks of 128 bits into cipher texts which have the same length using a secret key, so in fact it’s a permutation once you fix the key. It was standardized in the late nineties. It’s so important it’s implemented on the chip directly. And it’s the basic building block behind several methods for encrypting and authenticating data. Now because of its importance clearly the algorithm has a huge target on its back. It better be secure, otherwise we would really run into troubles.

So the question of course, is it actually secure? So you will think that the answer is a clear resounding yes. If the algorithm is so important. But I think it’s known to many here in the audience that despite years of study, more than two decades, we don’t have the answer we would like to have, especially as theoretical cryptographers. This perhaps could be quite surprising at first, but I want to give you a little bit of context. Also, some of it is a little bit of work that departs from the standard theoretical crypto type of work. So I want to give you a little bit more context before I tell you about our contribution in this space and how they did it. First of all, in general as theoretical cryptographers when we say that something is secure we typically mean that a particular scheme comes with a proof of security which takes the form of reduction.

For example for AES we would like to have a clear theorem that tells us that if a certain problem is hard, a certain computational problem is hard, then AES is secure as a pseudorandom permutation, which is the typical assumption we make on AES. Unfortunately we really don’t know of any meaningful hard problem we can leverage to prove the security of AES and the same story is true for basically all practical designs of block ciphers that are considered in the real world. So what theoretical cryptographers have done instead, over the years we’ve come up with all sorts of different constructions of good pseudorandom permutations that are based on standard, well-studied assumptions. Unfortunately these designs are…some of them are not that bad in terms of efficiency but are nowhere as practical as AES will be so they don’t end up being used in the real world except for some specific applications.

Now, in order to understand why AES is hard let me tell you a little bit more how it works, at least at the high level. Many of you have seen this, but I just want to introduce the right abstraction to think about it for the rest of the talk. First of all, with some caveats that if you’re familiar with you know what I’m talking about, otherwise, feel free to ignore this, but basically AES is what we call a Key-Alternating Cipher. In its general form, a Key-Alternating Cipher will start by encrypting an N-bit block for some parameter N, and it works by XORing a part of the secret key to the input and then it applies some fixed permutation, which is part of the block cipher. It’s a known function, there’s no secrets, it’s efficiently computable in both directions. There’s no clear hardness here. And then we keep iterating this over and over for R-rounds and then finally we add the final piece of key to it and then we get the output.

So AES is a particular instance of this with 10 to 12 rounds, N is 128, but other ciphers are also Key-Alternating Ciphers. The keys themselves that we’re adding, think of them as a very long key for the purpose of this talk. In practice they are generated heuristically via a key scheduling algorithm from a short key. In fact, we can be a little bit more specific about AES. So AES is a special type of Key-Alternating Cipher that we call a Substitution Permutation Network. So what it means is that if we look inside the actual permutation, we’ll see that it has a very specific structure where we take the input and we split it into blocks. It’s a little bit small there on the slide, but there’s a parameter B which is the size of these blocks. And then each block is sort of scrambled independently by using what we call a substitution box, which is a small permutation on B-bits.

Again, efficiently invertible, so there’s no hardness here. But it’s highly non-linear, so it’s a very carefully crafted function. And then these blocks are going to be mixed via a mixing layer which is a linear function. Again, something very easy to invert, no hardness here. And for example, for AES, the parameter B is eight, so we are looking at individual bytes and the S-box, again, I’m simplifying it a bit but basically, it’s the inverse function in a finite field. So you think about B-bit strings as elements in a finite field (indistinct) and then you compute the inverse. Except if you have a zero, that does map to zero. So the bottom line here is the following, if we think about this in terms of a cryptographer, as a complexity theorist, we have this round structure which is very simple but it’s very hard to pin down something which is really hard about it in a meaningful way.

So if you want to design a task which is hard for any computationally bounded adversary there’s nothing clear. I mean, you can maybe come up with something extremely artificial, but then it’s not useful to actually get a proof of security for AES. Yet somehow, when we iterate this we get something which looks secure. To me it’s always been sort of a failure of provable security the fact that we cannot prove anything like this. So the reduction based approach cannot tell us anything, yet this is the ideas that have been around since Shannon. How we should build a secure encryption mechanism. We cannot really say much about it. So for context, what has happened in order to say something about these constructions we basically have two types of results. So the first type of result is what I call here an ideal-model analyses.

This is some work where I was involved myself over the years in some of these results. So basically what we do is we just assume that we don’t look at the actual construction of the block cipher. We just assume that the component, either the big permutation in a Key-Alternating Cipher or the S-box, the small permutation, is a random permutation, it’s ideal, the adversary can make queries, access it as in Oracle and then we prove security in that model. It’s sort of a first layer of validation, but of course, it’s still disappointing in the general scheme of things. First of all, it only covers such generic attacks that invoke this box as a black box. And also for the specific case of S-boxes, there’ve been a few papers recently on this, it’s really a very weak guarantee because the security only holds for adversaries that cannot evaluate these S-box entirely, but the S-boxes have tiny domains so you get security against adversaries that run in time 256 or something like that.

It’s not very meaningful. You would likely infer something, but formally nothing follows out of that. And so the other thing that people have been doing is of course Cryptanalysis. That’s the most mainstream way of analyzing these algorithms. So people have come up with attacks, old ones, new ones, try to analyze and see whether they can break AES. And despite some initial optimism back in the day… So at the beginning this designer work has been extremely unsuccessful and here unsuccessful means it’s a good thing, right? There was not real progress unless you look at specific settings, like side channels and so on, for the basic secret key setting. The best attack we have, as of now, is a speedup of a factor of four, which is rather pathetic in terms of what we can do over a brute force key search. Which is impressive, right? Because, when it was designed we knew much less about Cryptanalysis of this type of algorithm.

Still, I guess as theoretical cryptographers this leaves us a little bit disappointed. We would like to still do more and of course we can’t do everything, we cannot come up with a proof of security, but something we try to promote here is at least to come up with proofs of security that can include large classes of attacks. Instead of proving security against all attacks we prove security against classes of attacks and security is really proven against for the actual algorithm and not for an idealized counterpart like in prior works. And again, so if you look at the overall pictures of attacks that had been considered, we’re definitely not the first ones to try this. So there exists some substantial body of work on trying to analyze specifically linear and differential Cryptanalysis, which are very restricted classes of attacks.

So here we want to do something more general and it’s of course not the most general thing, but what we’ve been doing, we decided to consider the notion of limited independence and T-wise independent permutations and the weaker notion of 2-wise or pairwise independence, which we will define shortly. In case you’re not familiar, as a target for at least a partial security guarantee. I mean, the good thing about these notions is that they really imply security for many classes of attacks that have been considered in the literature and also some that haven’t been. These are usually what we call statistical attacks that exploit some non-random properties of the block cipher over a few evaluations. So formally, we’re not yet as formally but…

So what I mean by T-wise independence, the idea that you have to think about is…What I want is that if I evaluate the block cipher by encrypting T distinct inputs, the outputs I get under a random secret key are supposed to look uniform. Now, this is too strong to have in particular because block ciphers implement permutations, so the outputs are going to be distinct as well. And it’s too strong for other reasons as well, but we’re going to target a relaxation of this which is what we call epsilon closeness to T-wise independence, which means that we require that the distribution of the outputs is epsilon close to uniform in the sense of statistical distance. And obviously the larger the T, the smaller the epsilon, the better this is going to be. So for example…

So first of all, this notion unlike… It doesn’t really resolve in this dilemma of finding an assumption because it can actually be proved unconditionally for a block cipher. That’s something we can hope for if the key is long enough. That’s usually not a problem here for small T’s because we will do the same thing that all of prior analysis and also Cryptanalysis have done, we just assume that the sub keys are independent, which is a heuristic assumption but it’s a standard one. So as the number of rounds progresses the key becomes longer and longer. And also note that for T equal 2, again, which will be the core of our resolve for four SPNs, the result is already interesting. It implies security against differential attacks, which has been a huge topic and is still partially unresolved for proving the security of AES but also much more. It will be the first result that implies that the outputs are actually looking random.

And then for larger T’s it implies security against even more attacks, like a higher order differential attack. I’ll tell you about our main result later which is about concrete SPNs and pairwise independents, but I want to first mention our first result that we have in the paper without giving you a proof, which is not about a particular concrete construction, it’s actually an existential result, but I find it particularly insightful because it gives us some ideas about how security is bootstrapped in this construction in decide the rate of construction. And basically our first result is an existential result that looks at Key-Alternating Ciphers, (indistinct) SPNs, so we look at the whole permutation, and also a relaxed version of it that allows for different permutations at each round. But what we show is that basically you can always come up with good instantiations of these permutations that gives you for an R-round Key-Alternating Cipher, gives you an almost R-wise independent permutation. So intuitively what that means is that the number of rounds really scales linearly with the degree of independence that you can have and each round amplifies independence a little bit more. So this result is an existential result, so it’s proven through a probabilistic algorithm method, but what’s important to me here is that the permutations here can be fixed. So it’s not an ideal model analysis. So you can pick this permutation, in fact, if you pick them randomly, most of them are going to be good and you get a secure construction. So unfortunately this is the only result I’m going to state about general T-wise independence.

For SPNs and for AES, what we deal with for now is mostly pairwise or 2-wise independent. And so what we analyze here is… The result is a little bit more general but two instantiation of this result are for SPNs that are instantiated either with the inverse S-box, like AES, or the functions like cube which is for example used in MiMC, which you might’ve heard about, it’s a popular cipher for MPC, FHE, zero-knowledge type applications. And so what we show here are…Also, I won’t talk explicitly about the mixing layer, but it also applies to any reasonable mixing layer that has been considered in practice. In fact, we can even weaken the assumptions on that. So our main theorems are actually about reduced round versions of SPN.

And so we really prove concrete bounce, so again I stress, is for the concrete construction here does not idealization or anything. So we do prove that, first of all, the 2-round and 3-round SPNs, already give you some degree of pairwise independence for corresponding epsilons. So these are general theorems that hold for arbitrary choices of the parameters, how you split the blocks. So what’s important here is that you don’t have to parse the bounce exactly but if you look at them, these two bounds are actually not good enough to imply any sort of meaningful security for AES. They actually become larger than one for those parameters. And we also have a third more refined argument that applies to the actual AES and gives us some non-trivial bound on the pairwise independence of 6-round AES.

Now, you might wonder here, or at least you should wonder, if these results are any good at all because they only apply to reduced round versions and these epsilons are fairly large. If you pay attention to them, they depend on B which is the small size. So the way to think about them as of now is that they are small enough to be amplifiable and what I mean by that is that you can generically prove… There’s an old fact that has been proved that if you have some constructions that are somewhat pairwise independent, as long as the epsilon is more than one half… So if you have two of them, one of them is epsilon-close, the other one is delta-close, if you could sequentially compose them, then you get something which is two times epsilon times delta. So you can iterate this, that factor two basically tells you that you need to start with some epsilon which is more than one half and then you can go arbitrarily small as you want.

So for example, we can get a corollary directly for AES if you take six R-rounds of AES, then we get an exponentially decaying bound on epsilon as the number of rounds grows. And again, the number of rounds you need to get something reasonable is very high for now, but this is the first result ever for AES or a similar construction that tells us that even though it looks very simple, it’s only two inputs, but it’s the first result of this type telling us that you get something which actually looks random. Brent, you have a question? (audience member speaking) Oh yeah. (audience member speaking) It should be, or at least it should be greater than one half. Yeah, exactly. So the point is that it becomes greater than one.

So we have to do something different for getting four AES parameters. (audience member speaking) You cannot amplify generically. So that’s why you have to look a little bit more rounds and then you prove directly that it’s small enough to amplify. (audience member speaking) But there are other cases where it might be useful because AES is pretty extreme in having just single bytes and then the K, which is how many blocks you have, is fairly large. But it will be reasonable to have something where you just have two blocks. I mean, for what we know that will still be secure. (audience member speaking) Yeah, AES is extremely parallel, that’s why it was building like that. (audience member speaking) I mean, for the whole thing it’s hard.

So if you mean experimentally, but for small scale version we can do that. (audience member speaking) For 2-round version you can, yeah. (audience member speaking) I think for 2-rounds we are probably close. I can tell you at the end when I give you a sketch… There’s still like 10 minutes or so and I can tell you where it’s loose. So also on this note, I want to say there’s been some work on being very precise about computing differential probabilities for AES, for a reduced round version. And now the results are somewhat incomparable, in a sense that for small rounds we don’t get the procedure they have, if we want just to have the standard implication, but also their results are too weak to imply any sort of uniformity. So somehow these are two different approaches and two different techniques. Also to answer some of these questions, I want to give you just a brief high level overview of the proofs in the last few minutes.

So the first thing I want to say something about mindset. I mean, this is something that I think was clear to (indistinct) to all Cryptanalysts but at least was not clear to me until I started realizing it myself the first time. So differences enhance differential Cryptanalysis and so are really what matters. I know this sounds like a high level meta statement that doesn’t make any sense, but I want to make it precise. So remember that the problem we are trying to solve here is we are trying to look at a pair of inputs and then we evaluate the SPN with the same key, and then we get two outputs and we want to show that they are close to uniform jointly. So it turns out that if we actually instead look at the difference of the two outputs, then the statement is equivalent. This is a small exercise that you can make. And in fact, the last key doesn’t really matter with respect to the statement, it matters to make the individual outputs uniform.

And then you can show that because the individual outputs are uniform if the difference is also uniform then the two outputs are jointly close to uniform. But it’s not only that. Also, if you look at what this adding keys does to the construction, for example, if you look at the two states after adding the first key and you look at their joint distribution…

So if you look at the difference after adding the first key there as well what this first key does is that basically it randomizes this pair of inputs into something which is a uniform condition and having the same difference as the input. So again, difference here is XOR. I should have said that. It’s the same as minus, it’s the same thing. So we really want to understand how these differences behave as we evolve through the construction, starting from the S-Box level. So for example, we have now an S-Box we evaluated on two inputs with the same key, XORing the same key.

We look at the output, if the two inputs have different delta, we want to understand how the output difference is behaving. And in fact, I’m going to confuse you a little bit by using this notation from now on, by just writing everything with just one track. But what I mean that is that there are really two tracks with two inputs and the difference is delta, right? I’m just representing it like this, but it’s just for (indistinct), right? So we want to understand the distribution of the output difference as a function of the input difference. And what turns out to be really interesting it’s if you work with finite fields of characteristic two, very strange things happen. And one thing that happen here is despite this S-box being inverse, I’m talking off inverse from now on, it turns out that there’s this nice linear algebra happening in there. And so the way that you should think about this output difference is basically it has a vector, which is sampled in a very large space.

And the space is the set of strings that are toggled on to the input difference. Now, this is very surprising. In fact, it cannot be true as I stated it, because obviously if you choose different representations of the field, things might look a little bit different. But it’s the right picture you should have. So what happens in reality is that there are some mapping Pi and Pi prime that you can apply to the differences (indistinct) this is true, right? So you really can think of these S-box as having an S geometric interpretation. So if you have an input difference, the output is obtained by just sampling something (indistinct) it at random and then kind of decoding it in some other way. I won’t give you the proof, but it’s really fairly elementary actually using finite fields. Now you start getting this feeling that this S-box is sort of adding entropy to the state. Like if you think of entropy in terms of min-entropy.

So if we start from an initial fixed difference and then you run it through an S-Box, what we have seen is that under a random key, the difference at the output of the S-Box is going to be something which is already a fairly high min-entropy, almost maximum but missing one bit. Of course, that one missing bit is sort of crucial. So a lot of the work now is to figure out where uniformity comes from if you’re missing one bit of entropy. So for example, one hole we will have is that if we apply another time the S-box we get something which is close to uniform. And it turns out that the core of our proof is exactly to analyze this type of extraction that happens on the S-box side. So what we prove, at least this is the simple type version of the statement, is that these exactly happened. So if you apply your S-Box to something which is high min-entropy, then what you get as an output as the difference is going to be near uniform.

And in fact, we look at the more general version of this, which is what we refer to as this Extraction Lemma, which is a pilot version of this. So we look at many S-Boxes in parallel K of them with independent keys and we have K input differences and now we want to understand the distribution of the K output differences. And we can prove different type of statements here. For example, a weak one is simply telling us that as long as all of these input differences have individually a sufficient min-entropy, like at least B minus one. And again, it could be completely correlated, in fact, it could be the same value, doesn’t need to be independent. But then the outputs are going to be close to uniform. So clearly the independence is then created by using different keys for each one of the S-boxes. And if we have a stronger condition where in fact we have some independence, which (indistinct) has a requiring for every subset of blocks.

The entropy sort of grows with the number of blocks in that subset is at least B minus one times the number of blocks. Then in fact, we can get an even stronger quantitative statement which tells us that we are even closer. Close and closer means that basically there’s an exponential improvement in terms of dependency on K, the number of blocks. So I’m not going to go into the proof of this, but it’s a neat application of free analysis that gives us this. The statement is also more general if you look at it in the paper. So it doesn’t need to B minus one the entropy. It’s just a more general statement. So let me just conclude telling you how this fits together. Of course, I’m very informal but I think some of the issues that (indistinct) was asking for are going to become clear here. So for example, for our weaker 2-round statement, now it’s very nice because you can start thinking about what different parts do, what the rules and different components play in the SPN.

So by that I mean, let’s say that we start now with some input difference. This is kind of the extreme case, three blocks, only one block is known zero difference. Now these we wanted to propagate to a uniform difference, right? And why is this happening? Well, first of all, what we have seen is that if we apply one round of S-boxes… Now, just by the subspace sampling idea, we’ll see that one of the blocks is going to become high entropy. Is going to have B minus one bits of entropy. And now, what happens if the mixing layer is good? In fact, you don’t need much of this, is that if you feed something where one block is an entropy you expect that the entropy is going to spread out. Again, with correlation, but at least we want to make sure that each individual block has a B minus one bits of entropy. And then we can just apply directly, have a weak Extraction Lemma and this is going to give uniformity.

In fact, we don’t even need the last mixing layer it’s just there to kind of annoy us. But it’s a permutation so it’s going to preserve the near uniformity. The 3-round version, and this is where I think there’s a lot of room to improve things. So basically what we do is we go back to what happens after the first round. All the blocks have sufficient min-entropy, but now we sort of forget this extra information. All we use here is the fact that that implies that with some good probability, all of these blocks are going to be different than zero. Just by a union bound. And then we can apply now again, our round of S-boxes but now because they’re all individually different from zero, each block is going to add randomness independently conditioned on the output of the first round. So now we get blocks here that have individually high min-entropy but they are independent.

And then again, we require from a good mixing layer. And again, all of them really do, the well-designed ones do that — it preserves these properties. And now we just can apply the stronger version of the Extraction Lemma and get a better bound. Now, where is it loose? And again, at the end everything is preserved. So for example, one place where this is clearly loose is just this assumption that all of the blocks are different than zero. I mean, we’d have to do a union bound of something that divides by two to the B. It’s not great. And that’s clearly one place where we could improve, but we don’t quite have the most elegant way. We have some thoughts, but I could talk about it offline. So this is everything I wanted to say. I’m sorry, we’re running out of time. But one thing I wanted to point out is that at least the way we saw this is really as part of a broader agenda.

And in fact, I feel personally, also it’s been my personal journey, I’ve been looking at many questions in symmetric crypto, that the communities have grown a little bit apart, crypto analysis and theory, but there’s really a lot of very good theoretical questions once you change your mindset a little bit in what is theoretical cryptography, perhaps. So the first thing I want to say is that… Let me give you first three concrete questions that we are thinking about where we have some partial progress. So for example, one thing we can definitely work on that has been already addressed is try to give tighter bounds. We have some ideas on how to do that, but again, it’s not clear whether they’re tight as possible. For now you get very lousy parameters out of this if you want to justify practice. Also again, T-wise independence for AES and SPNs is really completely open.

It’s actually interesting. It gives me pause for some concern actually, because if you look at how things have been designed when people were coming up with AES, it was all related to differential probability. So all of the components are made to work for input differences and to propagate them well. So I’m not even sure whether this will work out or whether we have to rethink a little bit these designs. It’s very different from what has been done. And again, T-wise independence for T larger than two. And again, alternative designs. Is also something that why only look at AES? Once we understand how this really works. And key-schedules also. How are the sub keys generated is something which is completely open, even in Cryptanalysis, people haven’t really looked at this. And again, I think that the broader question I really want to point out is that this is really perhaps on the practical end of applications, even though I think it’s fundamental, but I think we are seeing also across the radical crypto, obfuscation.

(indistinct) has talked about more and more of this type of construction that benefits us that are not provably secure in any meaningful sense. And I think it’s important to keep thinking about these types of theoretical questions, just forget about reduction, but try to prove as much as you can about these constructions. All right. So thanks so much. And the paper is on ePrint.

Stefano Tessaro

Associate Professor | University of Washington

Stefano Tessaro is an Associate Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, where he co-leads the cryptography group. He works primarily on cryptography, computer security and theoretical computer science. Earlier, he was an assistant professor at UC Santa Barbara for five years, where he held the Glen and Susanne Culler chair. He obtained a PhD from ETH Zurich in 2010, and held postdoctoral appointments at UCSD and MIT. He received several awards for his research, including a Sloan Research Fellowship, an NSF CAREER award, and a Hellman Fellowship, as well as a best-paper award at EUROCRYPT 2017.

Upgrade 2021: CIS LAB Speakers

The t-wise Independence of Substitution Permutation Networks

Stefano Tessaro: Associate Professor | University of Washington

Transcript of the presentation The t-wise Independence of Substitution Permutation Networks, given at the NTT Upgrade 2021 Research Summit, September 21, 2021.

Stefano Tessaro

Associate Professor | University of Washington

MORE videos from NTT's upgrade summit, september 2021

Our Research

News & Media

Events