Zoltan Toroczkai

University of Notre Dame

Towards Understanding the Fundamental Limits of Analog, Continuous-time Computing

Transcript of the presentation Towards Understanding the Fundamental Limits of Analog, Continuous-time Computing, given at the NTT Upgrade 2020 Research Summit, September 29, 2020

 

Hello everyone. My name is Zoltan Toroczkai. I am from University of Notre Dame, Physics department, and I’d like to thank the organizers for their kind invitation to participate in this very interesting and promising workshop. Also like to say that I look forward to collaborations with the PHI Lab and Yoshi and collaborators on the topics of this work. So today I’ll briefly talk about our attempt to understand, the fundamental limits of analog, continuous-time computing at least from the point of view of Boolean Satisfiability problem-solving using ordinary differential equations. But I think the issues that we raise during this occasion actually apply to other approaches, analog approaches, as well, until to other problems as well. I think everyone here, knows what Boolean Satisfiability problems are.

 

You have N Boolean variables, you have M clauses. Each a disjunction of K literals. Literal is a variable or its negation. And the goal is to find an assignment to the variable such that all the clauses are true. This is a decision type problem from the NP class, which means you can check in polynomial time for satisfiability of any assignment. And the 3-SAT is NP-complete with K, 3 or larger, which means an efficient 3-SAT solver implies an efficient solver for all the problems in the NP class because all the problems in the NP class can be reduced in polynomial time to 3-SAT. As a matter of fact, you can, reduce the NP-complete problems into each other. You can go from 3-SAT to Set Packing or to Maximum Independent Set, which is the set packing in graph theoretic notions or terms, to the Ising graph SAT problem decision version. This is useful when you are comparing different approaches or working on different kinds of problems.

 

When not all the clauses can be satisfied, you’re looking at the optimization version of SAT, called Max-SAT, and the goal here is to find the assignment that satisfies the maximum number of clauses, and this is from the NP-hard class. In terms of applications, if we had an efficient SAT solver, or NP-complete problem solver, it would literally, positively influence thousands of problems in applications in industry and science. I’m not going to read this. But this, of course, gives us some motivation, to work on these kinds of problems. Now, our approach to SAT solving involves embedding the problem in a continuous space, and you use all these to do that. So instead of working zeros and ones, we work with minus one and plus ones, and if we allow the corresponding variables, to change continuously between the two bounds, we formulate the problem with the help of a clause matrix.

 

If, if a clause does not contain a variable or its negation, the corresponding matrix element is zero. If it contains the variable in positive form, it’s one. If it contains the variable in negated form, it’s negative one. And now we use this to formulate these products, called clause violation functions, one for every clause, which rarely continues between zero and one and beyond zero if and only if the clause itself is true. Then we form… We define, also define the dynamics, search dynamics in this and the M-dimensional hypercube, where the search happens and if there exist solutions they’re sitting in some of the corners of this hypercube. So, we define this energy, potential or landscape function as shown here in a way that it, this is zero if and only if all the clauses, all the Kms are zero, all the clauses are satisfied, keeping these auxiliary variables, Ams, always positive. And, therefore, what we do here is a dynamics that is essentially a gradient descent on this potential energy landscape. If you are to keep all the Ams constant then it would get stuck in some local minimum. However, what do you do here is, we couple it with the dynamics. We couple it with the clause violation functions as shown here. And if you didn’t have these Am here, just had just the Kms, for example, you have essentially, both case you have a positive feedback. You have a decreasing variable, but in that case, you’ll still get stuck, would still behave…will still find solutions better than the constant version but still would get stuck.

 

Only when we put here this Am, which makes them dynamics in this variable exponential like, only then it keeps searching until it finds a solution. And there’s a reason for that, that I’m not going to talk about here, but essentially boils down to performing a gradient descent on a globally time-varying landscape. And, and, and this is what works. Now, I’m going to talk about the good or bad, and maybe the ugly. This is, this is… What’s good is that it’s a hyperbolic dynamical system, which means that if you take any domain in the search space that doesn’t have a solution in it or any solution in it, then the number of trajectories in it, the case exponentially quickly and the decay rate is a characteristic, invariant characteristic of the dynamics itself with the dynamical systems called the escape rate. The inverse of that is the timescale in which you find solutions by this dynamical system. And you can see here some trajectories, they are curved because it’s, it’s not linear but it’s transiently curved to give, if there are solutions of course, we could see eventually, it does lead to the solutions.

 

Now, in terms of performance, here what you show, for a bunch of, constraint densities, defined by, M over N, the ratio between clauses to variables, for random SAT problems, is random 3-SAT problems. And they, they, as, as function of N, and we look at, monitor the wall time, the wall clock time, and it, it behaves quite well, it behaves as a, polynomially, until you actually hit, or reach the set on set transition, where the hardest problems are found.

 

But what’s more interesting is if you monitor the continuous-time t, the performance in terms of the analog continuous-time t, because that seems to be a polynomial. And the way we show that, is we consider the random K-SAT or random 3-SAT for a fixed constraint density. And we here, what you show here is at the, right at the threshold where it’s really hard. And we monitor the fraction of problems that we have not been able to solve. We select thousands of problems at that cost rate ratio and we solve them with our algorithm, and we monitor the fraction of problems that have not yet been solved by continuous-time t. And these, as you see these decays exponentially, in different decay rates for different system sizes and in this spot shows that this decay rate behaves polynomially, or actually as a power log.

 

So if you combine these two, you find that the time needed to solve all problems, except maybe a [unintelligible] fraction of them, scales polynomially with problem size. So, you have polynomial continuous-time complexity. And this is also true, for other types of very hard constraints of the SAT problem such as exact color, because you can always transform them into 3-SAT as we discussed before, Ramsay coloring and, and on these problems, even algorithms like a survey propagation will fail. But this doesn’t mean that P equals NP because what you have, first of all, if you were to implement these equations in a device, whose behavior is described by these ODEs, then of course, t the continuous-time variable, becomes a physical wall clock time.

 

And that would be polynomially scaling but you have other variables, auxiliary variables, which fluctuate in an exponential manner. So, if they represent currents or voltages in your realization and it would be an exponential cost algorithm. But this is some kind of trade between time and energy while I know how to generate energy, or I don’t know how to generate time but I know how to generate energy so it could be useful. But there’s other issues, as well, especially if you’re trying to do this on a digital machine, but also happens, problems happen, appear, other problems appear on in physical devices as well as we discuss later.

 

So if you implement these in GPU, you can, then you can get an order of two magnitude speedup, and you can also modify this to solve Max-SAT problems quite efficiently, we are competitive with the best, heuristics solvers, this is all the problems in 2016, Max-SAT competition. So, this is definitely like a good approach, but there’s of course, interesting limitations. I would say interesting, because it kind of makes you think about what it needs and how you can explore this, these observations in understanding better analog continuous-time complexity. If you monitor the discrete number, the number of discrete steps, done by the Runge-Kutta integrator, and you solve this on a digital machine, you’re using some kind of integrator, and, you know, using the same approach, but now you measure the number of problems you haven’t solved, by a given number of discrete steps taken by the integrator. You find out, you have exponential discrete-time complexity. And of course, this is a problem. And if you look closely, what happens, even though the analog mathematical trajectory, that’s the red curve here, if you monitor what happens in discrete time, the integrator fluctuates very little. So this is like you know, third or four digits precision, but fluctuates like crazy.

 

So it really is like the integration freezes out, and this is because of the phenomenon of stiffness that I’ll talk a little bit, more about a little bit later. So you know, it may look like an integration issue on your digital machines that you could improve and you could definitely improve, but actually the issue is bigger than that. It’s, it’s deeper than that because on a digital machine there is no time energy conversion. So the auxiliary variables are efficiently represented in a digital machine, so there’s no exponential fluctuating current or voltage in your computer when you do this. So if e is not equal NP, then the exponential time complexity or exponential cost complexity has to hit you somewhere. And this is how.

 

But you know one would be tempted to think maybe, this wouldn’t be an issue in a analog device, and to some extent is true. Analog devices can be orders of magnitude faster, but they also suffer from their own problems because P not equal NP affects that class of solvers, as well. So, indeed if you look at other systems, like Coherent Ising Machine with Measurement-Feedback, or Polariton Condensate Graphs or Oscillator Networks, they all hinge on some kind of, our ability to control real variables with arbitrarily high precision, and Oscillator Networks, you want to read out arbitrarily close frequencies. In case of CIMs, we require identical analog amplitudes which is hard to keep and they kind of fluctuate away from one another, shift away from one another, And, and if you control that, of course, then you can control the performance.

 

So, actually one can ask if whether or not this is a universal bottleneck, and it seems so, as I will argue next. We can recall a fundamental result by A. Schönhage, Graham Schönhage from 1978 who says that, it’s a purely computer science proof, that, “If you are able to compute, the addition, multiplication, division of real variables with infinite precision then, you could solve NP-complete problems in polynomial time.” He doesn’t actually propose a solver, he just shows mathematically that this will be the case. Now, of course, in real world, you have loss of precision. So the next question is, “How does that affect the computation of our problems?” This is what we are after. Loss of precision means information loss or entropy production.

 

So, what we are really looking at, the relationship between hardness and cost of computing of a problem. And according to Sean Harget, there is this left branch, which in principle could be polynomial time, but the question, whether or not this is achievable, that is not achievable, but something more achievable that’s on the right-hand side. You know, there’s always going to be some information loss, some entropy generation that could keep you away from, possibly from polynomial time. So this is what we’d like to understand. And this information loss, the source of this is not just noise, as, as I will argue in any physical system, but it’s also of algorithmic nature. So that is a questionable area or, or approach, but Schönhage’s result is purely theoretical, no actual solver is proposed.

 

So we can ask, you know, just theoretically, out of curiosity, “Would in principle be such solvers?” Because he’s not proposing a solver. In such properties in principle, if you were to look mathematically, precisely what that solver does, would have the right properties. And I argue, yes, I don’t have a mathematical proof, but I have some arguments that this would be the case. And this is the case for actually our [unintelligible] solver, that if you could calculate, its trajectory and loss this way, then it would be… would solve NP-complete problems in polynomial continuous-time. Now, as a matter of fact, this is a bit more difficult question because time in all these can be re-scaled however you want.

 

So, what Bournez says, that you actually have to measure the length of the trajectory, which is an invariant of the dynamical system or the property of the dynamical system, not of its parametrization. And we did that. So Shubha Kharel my student did that, by first improving on the stiffness of the problem of the integrations using the implicit solvers and some smart tricks, such that you actually are closer to the actual trajectory and using the same approach to know, what fraction of problems you can solve. We did not give a length of the trajectory; you find that it is polynomially scaling with the problem size. So, we have polynomial scale complexity. That means that our solver is both poly-length, and as it is defined, it’s also poly-time analog solver. But if you look at as a discrete algorithm, which will measure the discrete steps on a digital machine, it is an exponential solver, and the reason is because of all this stiffness.

 

So every integrator has to truncate, digitize and truncate the equations. And what it has to do is to keep the integration within this so-called Stimpy TD gen for, for that scheme. And you have to keep this product within eigenvalues of the Jacobian and the step size within this region. If you use explicit methods, you want to stay within this region. But what happens, that some of the eigenvalues grow fast for stiff problems, and then you’re, you’re forced to reduce that t, so the product stays in this bounded domain, which means that now you have to, we are forced to take smaller and smaller time steps, so you’re, you’re freezing out the integration and what I will show you, that’s the case.

 

Now you can move to implicit solvers, which is a new trick. In this case, your stability domain is actually on the outside, but what happens in this case, is some of the eigenvalues of the Jacobian, also for this instance start to move to zero, as they are moving to zero, they are going to enter this instability region. So your solver is going to try to keep it out, so it’s going to increase the delta t, but if you increase that t, you increase the truncation errors, so you get randomized in the large search space. So it’s, it’s really not, not going to work out.

 

Now, one can sort of, introduce a theory or a language to discuss computational, analog computational complexity, using the language from dynamical systems theory. But basically, I don’t have time to go into this, but you have for hard problems, the chaotic object the chaotic saddle in the middle of the search space somewhere, and that dictates how the dynamics happens and invariant properties of the dynamics, of course, of that saddle is what determines performance and many things.

 

So, an important measure that we find that is also helpful in describing this analog complexity is the so-called Kolmogorov or metric entropy. And, basically, what this does in an intuitive way, is to describe the rate at which the uncertainty, containing the insignificant digits of a trajectory in the back, they flow towards the significant ones, as you lose information because of errors being grown or developed into larger errors in an exponential, at an exponential rate because you have positive Lyapunov exponents. But this is an invariant property. It’s the property of the set of all these, not how you compute them. And it’s really the intrinsic rate of accuracy loss of a dynamical system.

 

As I said that you have in such a high dimensional dynamical system, you have positive and negative Lyapunov exponents, as many as the total is the dimension of the space and user dimension, the number of unstable manufactured dimensions and assets now more stable many forms dimensions. And there’s an interesting and I think important Pesin equality, equality called the Pesin equality, that connects the information theoretic, as per the rate of information loss with the geometric data each trajectory separate minus cut part which is the escape rate that I already talked about.

 

Now, one can actually prove a simple theorem, like a back of the envelope type calculation. The idea here is that you know the rate at which the largest rate at which the closely started trajectory separate from one another. So now you can say that that is fine, as long as my trajectory finds the solution before the trajectory separate too quickly. In that case, I can have the hope, that if I start from some region of the phase space, several closely started trajectories, they kind of go into the same solution over time. And that’s this upper bound of this limit. And it is really showing that it has to be an exponentially smaller number, but it depends on the n dependence of the exponent right here, which combines information loss rate and the solution time performance.

 

So these, if these exponents here or there, has a large independence, so even a linear independence, then you really have to start trajectories exponentially closer to one another in order to end up in the same order. So, this is sort of like the, the direction that you are going into, and this formulation is applicable to all deterministic dynamical systems. And I think we can expand this further because the, there is a way of getting the expression for the escape rates in terms of n the number of variables from cycle expansions, that I don’t have time to talk about, but it’s kind of like a program that you can try to pursue.

 

And this is it. So the conclusions, I think are self-explanatory. I think there is a lot of future in analog continuous-time computing. They can be efficient by orders of magnitude than digital ones in solving NP-hard problems, because first of all, many of the systems lack of von Neumann bottleneck, there’s parallelism involved and you can also have a larger spectrum of continuous-time dynamical algorithms than discrete ones. And, and, you know, but we also have to be mindful of what are the possibilities, what are the limits? And one, one open question, if any important open question is you know, What are these limits? Is there some kind of no-go theorem that tells you that, you can never perform better than this limit or, or that limit? And I think that’s, that’s the exciting part to, to derive these limits and to get to an understanding about what’s possible in this in this area. Thank you.

Towards Understanding the Fundamental Limits of Analog, Continuous-time Computing

Zoltan Toroczkai head shot

Zoltan Toroczkai,
University of Notre Dame