Eli Yablonovitch: Physics Successfu...

Eli Yablonovitch

Professor, UC Berkeley

Physics Successfully Implements Lagrange Multiplier Optimization

Transcript of the presentation Physics Successfully Implements Lagrange Multiplier Optimization, given at the NTT Upgrade 2020 Research Summit, September 29, 2020

Hello everybody. My title is Physics Implements Lagrange Multiplier Optimization. And let me be very specific about what I mean by this, is that in physics, there are a series of principles that are optimization principles. And we are just beginning to take advantage of them. For example, most famous in physics is the principle of least action. Of equal importance is the principle of least entropy generation. That’s to say a dissipated circuit will try to adjust itself to dissipate as little as possible. There’s other concepts first-to-gain-threshold, the variational principle, the adiabatic method, simulated annealing but actual physical annealing.

So let’s look at some of these that I’m sure you probably know about is the principle of least time. And this is sort of illustrated by a lifeguard who is trying to save a swimmer and runs as fast as possible along the sand and finally jumps in the water. So it’s like the refraction of light. The lifeguard is trying to get to the swimmer as quickly as possible and is trying to follow the path that takes the least amount of time. This of course occurs in optics and classical mechanics and so forth. It’s the principle of least action.

Let me show you another one. The principle of minimum power dissipation. Imagine you had a circuit like this, where the current was dividing unequally. Well, that would make you feel very uncomfortable. The circuit will automatically try to adjust itself, so that the two branches which are equal actually are drawing equal amount of current. If they are unequal, it will dissipate excess energy. So we talk about least power dissipation, more sophisticated way of saying the same thing is the least entropy production. This is actually the most common one of all.

Here’s one that’s kind of interesting. People have made a lot of hay about this, is you have lasers and you try to reach the threshold. And so you have different modes on the horizontal axis. And then one mode happens to have the lowest loss and then all the energy goes into that mode. This is the first-to-gain-threshold. This is also a type of minimization principle because physics finds the mode with the lowest gain threshold. Now, what I’ll show about this, is it’s not as good as it seems because there continues to be, even after you reach the gain threshold, there continues to be evolution among the modes. And so it’s not quite as clear cut as it might seem.

Here’s the one it’s famous, the variational principle. It says you have a trial wave function, the red one, it’s no good because it has too much energy. The true wave function is illustrated in green. And that one has finds automatically, finds the situation with the wave function has the lowest energy.

Here’s one, of course it’s just physical annealing in which you could do as physical annealing, which you could also think of it as simulated annealing. And in simulated annealing, you add noise or you raise the temperature, or do something else to jump out of local minima. So you do tend to get stuck in all of these methods. You tend to get stuck in local minima and you have to find a strategy to jump out of those local minima, but certainly physical annealing actually promises to give you a global optimum. So that’s, we’ve got to keep that one in mind.

And then there’s the adiabatic method. And in the adiabatic method, you have modes. Now I am one who believes that we could do this even classically, just with LC circuits. We have avoided crossings. And the avoided crossings are such that you start from a solvable problem, and then you go to a very difficult-to-solve problem. And yet you stay in the ground state, and I’m sure you all know this. This is the adiabatic method. Some people think of it as quantum mechanical, it could be, but it’s also classical. And what you’re adjusting is one of the inductances in a complicated LC circuit. And this is sort of another illustration of the same thing, a little bit more complicated graph. You go from a simple Hamiltonian to a hard Hamiltonian, and you find a solution that way.

So these are all minimization principles. Now, one of the preferred attributes is to have a digital answer, which we can get with bistable elements, physics is loaded with bistable elements, starting with the flip-flop. And you can imagine somehow coupling them together. I show you here just resistors, but it’s very important that the, you don’t have a pure analog machine. You want to have a machine that provides digital answers and the flip-flop is actually an analog machine, but it locks into a digital state. And so, we want bistable elements that will give us binary answers.

Okay, so having quickly gone through it, which of these is the best? So let’s try to answer, which of these is the best for doing optimization? Which physics principle might be the best? And so one of our nice problems that we like to solve is the Ising problem. And there’s a way to set that up with circuits and you can have LC circuits and try to mimic the ferromagnetic case as the two circuits are in phase and so you have, you try to lock them into, either positive or negative phase. You can do that with parametric gains. You have classical parametric gain with a two-omega modulation on a capacitor and it’s bistable. And if you have crossing couplings, then it’s a, the phases tend to be opposite. And so you tend to have anti-ferromagnetic coupling. So you can mimic with these circuits, but there’s so many ways to mimic it. So we’ll see some more examples.

Now, one of the main points I’m going to make today is that it’s very easy to set up a physical system that not only does optimization, but also includes constraints, and the constraints we normally take into account with Lagrange multipliers, and this is sort of an explanation of Lagrange multipliers. You’re trying to go toward the absolute optimum here, but you run into the red constraint. So you get stopped right there. And the gradient of the constraint is opposite to the, they cancel each other, the gradient of the merit function. So this is standard stuff in college, Lagrange multiplier calculus.

So if physics does this, how does it do it? Well, it does it by steepest descent. We just follow it. Physics, for example, will try to go to the state of lowest power dissipation. So it goes, and it minimizes the participation in blue, but also tries to satisfy the constraint. And then we finally, we find the optimum point in some multi-dimensional configuration space. Another way of saying it, is we go from some initial state to some final state, and physics does this for you for free, because it is always trying to reduce the entropy production, the power dissipation.

And so there have been, I’m going to show you now five different schemes, actually I have about eight different schemes. And they all use the principle of minimum entropy generation, but not all of them recognize it. So here’s some work from my colleague, Roychowdhury here in my department, and he has these very amplitude-stable oscillators, but they tend to lock into a phase, and in this way, it’s unnatural for solving the Ising problem. But if you analyze it in detail, and I’ll show you the link to the archive where we’ve shown this, is that this one is trying to satisfy the principle of minimum entropy generation and it includes constraints. And the most important constraint for us is that we want a digital answer. So we want to have either a plus or minus as the answer and the parametric oscillator permits that. He’s not using a parametric oscillator, he’s using something a little different, but it’s somewhat similar. He’s using lock sort of second-harmonic locking. It’s similar to the parametric oscillator.

And here’s another approach from England, Cambridge University. I have the symbol of the university here and they got very excited. They have polaritons, exciton-polaritons they were very excited about that. But to us they’re really just coupled electromagnetic modes and created by optical excitation. And they lock into definite phases and no big surprise there. Actually, it also follows, it tends to lock in, in such a way that it minimizes the power dissipation, and it is very easy to include the digital constraint in there. And so that’s yet another example. Of course, all the examples I’m going to show you from literature are all following the principle of minimum entropy generation. This is not always acknowledged by the authors.

This is the Yamamoto Stanford approach. Thank you very much for inviting me. So I’ve analyzed this one with, we think that what’s going on here. I think the quantum mechanical version could be very interesting possibly. But the versions that are out there right now are they’re dissipative, and there’s dissipation in the optical fiber it’s overcome by the parametric gain. And the net conclusion of this is that the different optical parametric oscillator pulses are trying to organize themselves in such a way as to minimize the power dissipation. So it’s based upon minimum entropy generation, which for our purposes is synonymous with minimizing the power dissipation. And of course, very beautifully done. It is a very beautiful system because it’s time multiplexed and it locks into digital answers. So that’s very nice.

Here’s something different, not the Ising problem, from MIT. But it is an optimizer. It’s an optimizer for artificial intelligence. It uses silicon photonics and does unitary operations. We’ve gone through this very carefully. I’m sure to the people at MIT, they think they have something very unusual. But to us, this is usual. This is an example of minimizing the power dissipation. As you go round over and over again, through the silicon photonics, you end up minimizing the power dissipation. It’s kind of surprising. And principle of minimum entropy generation again.

Okay. And this is from my own group where we try to mimic the coherent Ising machine, except it’s just electrical. And we get the, this is an anti-ferromagnetic configuration. If the resistors were this way, it would be a ferromagnetic configuration. And we can arrange that. So I’ve just done five of my, I think I could have done a few more, but we’re running out of time. But all of these optimization approaches are similar in that they’re based upon minimum entropy generation, which is a, I don’t want to say it’s a law of physics, but it’s accepted by many physicists, and you have different examples, including particularly MIT’s optimizer for artificial intelligence.

They all seem to take advantage of this type of physics. So they’re all versions of minimum entropy generation. The physics hardware implements steepest descent physically. And because of the constraint though, it produces a binary output. Which is digital in the same sense that a flip-flop is digital. What’s the promise? The promise is that the physics-based hardware will perform the same function at far greater speed and far less power dissipation.

Now, the challenge of global optimization remains unsolved. I don’t think anybody has a solution to the problem of global optimization. We can try to do better; we can get a little closer. But if, so even setting that aside, there all these terrific applications in deep learning and in neural network back-propagation, artificial intelligence, control theory. So there many applications, operations research, biology, et cetera.

But there are a couple of action items needed to go further. And that is, I believe that the electronic implementation is perhaps a little easier to scale. And so we need to design some chips. So we need a chip with an array of oscillators. If you had a thousand LC oscillators on the chip, I think that would be already be very interesting. But you need to interconnect them. This would require a resistive network with about a million resistors. I think that can also be done on a chip. So minimizing the power dissipation is the whole point, but you’ll do have to, there is a good problem. The resistors have to be very precise, but there’s good news.

Resistors can be programmed very accurately, and I’ll be happy to take questions on that. So later step though, once we have the chips, is we need compiler software to convert the unknown problem into the given resistance values that will fit within these oscillator chips. So let me pause then for questions and thank you very much for your attention.

Physics Successfully Implements Lagrange Multiplier Optimization

Eli Yablonovitch

Eli Yablonovitch,
Professor, UC Berkeley

recent publications by Eli Yablonovitch

Eli Yablonovitch

Eli Yablonovitch, Professor, UC Berkeley

recent publications by Eli Yablonovitch

Our Research

News & Media

Events

Eli Yablonovitch,
Professor, UC Berkeley