## Elaine shi

**Professor, Cornell and Carnegie Mellon University**

Can We Beat the AKS Sorting Network?

Transcript of the presentation *Can We Beat the AKS Sorting Network?,* given at the NTT Upgrade 2020 Research Summit, September 30, 2020

Thank you so much for inviting me. I’d like to talk about some new understanding of sorting circuits. We’ve been working on this line of work since 2017. So, I’ll mention the results from a few papers to paint essentially the frontier of our understanding. This is trying work with many wonderful collaborators. Some of these studies were also partly motivated by a somewhat different quest, that is: how to construct optimal oblivious RAM. Although, in this talk I’ll mostly focus on the circuit model rather than the ORAM. At the end of the talk, however, I’ll quickly mention how the circuit results are related to the optimal ORAM, and how they are not related. In other words, how the circuit techniques actually depart from the techniques we use to construct optimal ORAM. I’ll only have time to mention the results and you’ll have to read our paper for the details.

Sorting circuits have been studied for many decades. A long-standing open question in the complexity and algorithms literature is the following: Does there exist sorting circuits of o(n log n) size? So what do we know about this question? First, we know that to get anything better than (n log n), it cannot be comparison-based. Comparison-based has a well-known (n log n) lower bound. And this lower bound applies no matter whether it’s the RAM model or circuit model. If we forego the comparison-based requirement, however, we know that on the RAM, sorting can be accomplished in nearly linear time.

Unfortunately, these RAM algorithms, critically rely on dynamic memory axes and cannot be converted to the circuit model in a way that preserves efficiency. Okay, so we’ve been stuck on this question for several decades. As I mentioned, this is one of the well-known long-standing open questions in the complexity and algorithms literature. Somehow, we cannot make progress either on the upper bound and lower bound fronts. In some sense, it’s almost surprising that after so many years, we still don’t understand sorting circuits. So, let’s see why we are so stuck.

On the upper bound front, we are stuck at the AKS sorting network from 1983. The AKS sorting network is comparison based, so it is actually optimal in the comparative-based model. A long-standing question is: Can we beat AKS if we forego the comparison-based restriction? It turns out that we haven’t made any progress at all along this front. On the lower bound side, we also seem to be pretty stuck. In fact, not only do we not know how to prove (n log n) lower bound on sorting circuits; we don’t know how to prove a superlinear lower bound. And in fact, it turns out proving superlinear circuit lower bound for any problem into the n key, is beyond the reach of current techniques.

Okay, despite all these long-standing barriers, we were able to make a little progress in terms of understanding sorting circuit complexity, both on the upper bound and the lower bound fronts. On the upper bound side, somewhat imprecisely speaking, we showed that sorting (n) elements, each tagged with the k-bit key can be accomplished with a circuit of size and timescale. So if, for example, K is asymptotically smaller than log n, we can actually defeat the AKS sorting network. Our result can also be viewed as a generalization of the AKS sorting circuits. And note that I’m ignoring polylog star terms in the bound.

On the lower bound side, we showed that essentially the above upper bound is tight for every choice of (k), either assuming the indivisibility model or assuming the Li-Li network coding conjecture. So let me explain. The indivisibility model assumes that the element’s payload strings are opaque, and the circuit does not perform any encoding or computation on the element’s payload strings. And indeed, almost all of the algorithms we know are, indeed, in the indivisibility model. Now the Li-Li conjecture is a well-known conjecture in the area of network coding. It posits that network coding cannot help anything beyond the standard multi-commodity flow rates in undirected graphs. So while no one knows how to prove unconditional super linear circuit lower bounds, we were able to prove a conditional lower bound. And, you know, the lower bound also implies that if the Li-Li network coding conjecture is true, then one cannot build a sorting circuit of o(n log n) size for the, you know, case of general keys.

So for the rest of the talk, let me say something about this upper bound and why it turns out to be very much nontrivial. In fact, it turns out that even for the 1-bit key special case, the result is very much nontrivial and there are many natural barriers towards achieving these results. So essentially in the 1-bit special keys, right, the result says we can sort 1-bit keys in the linear-sized circuits. I also want to mention that in the problem formulation, besides the 1-bit key, every element also has a payload string. And when you sort, you have to carry the payload string around; because otherwise, had it not been the payload string, you could just count how many ones there are, and then write down the answer.

Before I talk about, you know, even why the 1-bit case has many barriers, let me actually quickly mention that the 1-bit case has a very cool implication. It implies that median can be computed in the linear-sized circuit, as well. Remember in your undergrad algorithms class, we learned the textbook Blum’s algorithm for computing median on the RAM. And we know that it can be computed in linear time deterministically on the RAM. And in fact, this is one of my favorites when I teach undergrad algorithms.

So you would almost expect that, I mean, of course, naturally, you should be able to do the same, you know, in the linear-sized circuit. But it turns out to be much harder than you might expect. And no one really knows how until our work. In some sense, the natural barriers for sorting 1-bit key also apply to median too. And for both of these problems, sorting 1-bit keys and median, in the circuit model, like, believe it or not, the prior best-known solution is actually AKS sorting circuits itself. And nothing better is known so far. And so to help you understand why, you know, for something so natural, like, it’s so natural that if I didn’t tell you it’s hard, you’d almost take for granted.

And, and let me explain why there are natural barriers. So the first barrier was actually described, even in Knuth’s textbook from the 1970s. And, you know, the textbook said essentially, such a result would not have been possible in the comparative-based model. And the reason is because of the zero-one principle, right, so the zero-one principle is that any comparison-based sorting circuit, if it can sort zero-one keys, it can also sort general keys. So, therefore, the (n log n) lower bound for comparative-based sorting actually applies even to the 1-bit key case. Okay, well, to the best of our knowledge, our existing sorting circuit constructions indeed are in the comparison-based model. You know, it’s been a natural question, like, whether we can achieve anything better using non comparison-based techniques. Nothing is known, and this is not like the RAMs model, right? In the RAM model, we know how to make use of non comparison-based techniques to get interesting results.

The second barrier was actually recently shown in our own work, as well as a work by [unintelligible] and others. Okay, so it turns out for the 1-bit key special case, if you require stability in the sorting, again there’s (n log n) barrier. And again, the barrier holds either assuming the indivisibility model or the Li-Li network coding conjecture.

Here, stability means that for the elements with the same key, we insist that the other in the output array must respect the relative other in the input array. Okay, so to get a linear-sized sorting circuit for 1-bit keys, not only do we have to forgo the comparison-based restriction, we also have to forego stability. So finally, you know, when we overcome these barriers, and we are eventually able to construct a sorting circuit for 1-bit keys, the next question is how to upgrade it to a sorting circuit for k-bit keys. And here, we encounter another challenge. And the challenge is exactly because the 1-bit sorting circuit is not stable, right?

Had it been stable, you know, a natural idea would be to use radiant sorts. But radiant sorts expect that the 1-bit sorting case, you know, is stable. So in fact, to do this upgrade, we came up with a new technique, which is like a clever two-parameter recursion technique. Okay, so I won’t have time to go into details. Let me quickly comment about the techniques at a very high level. So, essentially, we start with Pippenger’s self-routing super concentrator. Imprecisely speaking, if we directly converted his super concentrator construction to the circuit model, we would incur (n log n), but then we can rely on the cool observation that was actually made in the earlier work on constructing smart-depth, perfect, oblivious, parallel RAM.

So in this work we observed that Pippenger’s super concentrated construction actually has the online phase and the offline phase. So interestingly, the offline phase depends only on metadata, and it never looks at the element’s payload strings. And also, interestingly, it’s also the offline phase that’s causing the (n log n). Whereas the online phase is easier to implement in linear-sized circuits. So, by exploiting the fact that the offline phase operates only on metadata, we are able to use a recursive bootstrapping technique to essentially squash the (n log n) to something like (n polylog * n).

So last, but not least, let me say a few words about how this is related to optimal ORAM. In optimal ORAM, we essentially need an oblivious algorithm that sorts 1-bit keys in linear time, and this is on the RAM, right? So to get that, we also rely on Pippenger’s super concentrated construction, and we rely on the same offline, online insights, but then to get rid of the (n log n) in the oblivious RAM model actually almost requires, like, in some sense, the opposite techniques from the circuit model, right? So in the, on the RAM model, we know that every word has log n bits, and we can simultaneously perform log n bitwise operations in unit costs, you know, because this is like operation on a single word.

So therefore on oblivious RAM, one of the core techniques we use is packing. But, you know, packing is, like, there’s no free packing in the circuit model, like in a circuit model, every wire, every gate would have counted. So therefore, the algorithm tricks we use in the circuit model is actually rather different. Okay, I guess this is about as much as I can say about this line of work. To summarize, it’s almost surprising that after so many years, we still don’t understand sorting circuits. In fact, you know, it looks like we’ve been pretty stuck since 1980s. So we were able to actually push the frontier of our understanding a little bit, both in terms of upper and lower bounds. Thank you so much for your attention.

Can We Beat the AKS Sorting Network?

## Elaine Shi,

Professor, Cornell and Carnegie Mellon University

## recent publications by Elaine Shi

- Sorting Short Keys in Circuits of Size o(n log n)
- On the Adaptive Security of MACs and PRFs
- Round-Efficient Byzantine Broadcast Under Strongly Adaptive and Majority Corruptions
- Secure Massively Parallel Computation for Dishonest Majority
- Elaine Shi's Google Scholar page
- Summary of Elaine Shi's presentation