Hideo Mabuchi

Professor, Applied Physics, Stanford University

How Combinatorial Optimization Can Bring Improvements to Coherent Ising Machines and other Computing Challenges

Transcript of the presentation How Combinatorial Optimization Can Bring Improvements to Coherent Ising Machines and other Computing Challenges, given at the NTT Upgrade 2020 Research Summit, September 29, 2020

 

Hi, I’m Hideo Mabuchi from Stanford University. This is my presentation on coherent nonlinear dynamics, and combinatorial optimization. This is going to be a talk, to introduce an approach, we are taking to the analysis, of the performance of Coherent Ising Machines. So let me start with a brief introduction, to Ising optimization. The Ising model, represents a set of interacting magnetic moments or spins, with total energy given by the expression, shown at the bottom left of the slide. Here the sigma variables are meant to take binary values. The matrix element Jij, represents the interaction, strength and sign, between any pair of spins ij, and hi represents a possible local magnetic field, acting on each thing. The Ising ground state problem, is defined in an assignment of binary spin values, that achieves the lowest possible value of total energy, and an instance of the Ising problem is specified by given numerical values, for the matrix j and vector h.

 

Although the Ising model originates in physics, we understand the ground state problem, to correspond to what would be called, quadratic binary optimization, in the field of operations research. And in fact, in terms of computational complexity theory, it can be established that the Ising ground state problem is NP complete. Qualitatively speaking, this makes the Ising problem, a representative sort of hard optimization problem, for which it is expected that the runtime required by any computational algorithm to find exact solutions should asymptotically scale exponentially with the number of spins, and for worst case instances at each end.

 

Of course, there’s no reason to believe that, the problem instances that actually arise, in practical optimization scenarios, are going to be worst case instances. And it’s also not generally the case in practical optimization scenarios that we demand absolute optimum solutions. Usually we’re more interested in, just getting the best solution we can, within an affordable cost, where costs may be measured in terms of time, service fees and/or energy required for computation. This focus is great interest on, so-called heuristic algorithms, for the Ising problem and other NP complete problems, which generally get very good, but not guaranteed optimum solutions, and run much faster than algorithms, that are designed to find absolute optima.

 

To get some feeling for present day numbers, we can consider the famous traveling salesman problem, for which extensive compilations of benchmarking data may be found online. A recent study found that, the best-known TSP solver required median runtimes, across a library of problem instances, that scaled as a very steep root exponential, for “n” up to approximately 4,500. This gives some indication of the change, in runtime scaling for generic, as opposed to worst-case problem instances. Some of the instances considered in this study were taken from a public library of TSPs, derived from real world VLSI design data.

 

This VLSI TSP library includes instances with n ranging from 131 to 744,710. Instances from this library with n between 6,880 and 13,584, were first solved just a few years ago, in 2017 requiring days of runtime, and a 48-core two-gigahertz cluster. All instances with n greater than or equal to 14,233, remain unsolved exactly by any means. Approximate solutions however, have been found by heuristic methods, for all instances in the VLSI TSP library, with for example, a solution within 0.014% of a known lower bound, having been discovered for an instance, with n equal 19,289, requiring approximately two days of runtime, on a single core at 2.4 gigahertz.

 

Now, if we simple-mindedly extrapolate the root exponential scaling from the study up to n equal 4,500, we might expect that an exact solver, would require something more like a year of runtime, on the 48-core cluster, used for the n equals 13,584 instance, which shows how much, a very small concession on the quality of the solution, makes it possible to tackle much larger instances, with much lower costs. At the extreme end, the largest TSP ever solved exactly has n equal 85,900. This is an instance derived from 1980s VLSI design, and this required 136 CPU years of computation, normalized to a single core at 2.4 gigahertz. But the 20-fold larger, so-called world TSP benchmark instance, with n equals 1,904,711, has been solved approximately, with an optimality gap bounded below 0.0474%.

 

Coming back to the general practical concerns of applied optimization, we may note that a recent meta study, analyzed the performance of no fewer than, 37 heuristic algorithms for MaxCut, and quadratic binary optimization problems and found the performance… Sorry, and found that a different heuristics work best for different problem instances, selected from a large scale heterogeneous test bed, with some evident but cryptic structure, in terms of what types of problem instances were best solved by any given heuristic. Indeed, there are reasons to believe that these results for MaxCut and quadratic binary optimization reflect a general principle of a performance complementarity among heuristic optimization algorithms. In the practice of solving hard optimization problems, there thus arises the critical pre -processing issue of trying to guess which of a number of available, good heuristic algorithms should be chosen to tackle a given problem instance.

 

Assuming that any one of them would incur high cost to run on a large problem of incidents, making an astute choice of heuristic is a crucial part of maximizing overall performance. Unfortunately, we still have very little conceptual insight about what makes a specific problem instance good or bad for any given heuristic optimization algorithm. This is certainly pinpointed by researchers in the field, as a circumstance that must be addressed. So adding this all up, we see that a critical frontier for cutting-edge academic research involves both the development of novel heuristic algorithms that deliver better performance with lower costs, on classes of problem instances that are underserved by existing approaches, as well as fundamental research to provide deep conceptual insight into what makes a given problem instance easy or hard for such algorithms. In fact, these days, as we talk about the end of Moore’s law, and speculate about a so-called second quantum revolution, it’s natural to talk not only about novel algorithms for conventional CPUs, but also about highly customized, special purpose hardware architectures on which we may run entirely unconventional algorithms for combinatorial optimizations, such as the Ising problem.

 

So against that backdrop, I’d like to use my remaining time, to introduce our work on, analysis of coherent Ising machine architectures and associated optimization algorithms. Ising machines, in general, are a novel class of information processing architectures for solving combinatorial optimization problems, by embedding them in the dynamics of analog, physical, or a cyber-physical systems. In contrast to both more traditional engineering approaches that build Ising machines using conventional electronics, and more radical proposals that would require large scale quantum entanglement, the emerging paradigm of coherent Ising machines leverages coherent nominal dynamics in photonic or optical electronic platforms to enable near-term construction of large-scale prototypes that leverage post-CMOS information dynamics.

 

The general structure of current CIM systems is shown in the figure on the right. The role of the Ising spins, is played by a train of optical pulses circulating around a fiber optical storage ring. That beam splitter inserted in the ring is used to periodically sample the amplitude of every optical pulse and the measurement results are continually read into an FPGA, which uses then to compute perturbations, to be applied to each pulse by synchronized optical injections. These perturbations are engineered to implement the spin-spin coupling and local magnetic field terms of the Ising hamiltonian, corresponding to a linear part of the CIM dynamics. Asynchronously pumped parametric amplifier, denoted here as PPL and wave guide, adds a crucial nonlinear component, to the CIM dynamics as well. And the basic CIM algorithm, the pump power starts very low, and is gradually increased. At low pump powers, the amplitudes of the Ising spin pulses behave as continuous complex variables, whose real parts which can be positive or negative, by the role of soft or perhaps mean field spins. Once the pump power crosses the threshold for parametric self-oscillation in the optical fiber ring, however, the amplitudes of the Ising spin pulses become effectively quantized into binary values. While the pump power is being ramped up, the FPGA subsystem continuously applies its measurement based feedback implementation of the Ising hamiltonian terms.

 

The interplay of the linearized Ising dynamics implemented by the FPGA and the thresholds quantization dynamics provided by the sink pumped parametric amplifier result in a final state of the optical plus amplitudes at the end of the pump ramp that can be read as a binary strain, giving a proposed solution of the Ising ground state problem. This method of solving Ising problems seems quite different from a conventional algorithm that runs entirely on a digital computer, as a crucial aspect of the computation is performed physically, by the analog continuous coherent nonlinear dynamics of the optical degrees of freedom. In our efforts to analyze CA and performance, we have therefore turned to the tools of dynamical systems theory, namely: a study of bifurcations, the evolution of critical points, and typologies of heteroclinic orbits, and basins of attraction.

 

We conjecture that such analysis can provide fundamental insight into what makes certain optimization instances hard or easy for coherent Ising machines, and hope that our approach can lead to both improvements of the core CIM algorithm and the pre-processing rubric for rapidly assessing the CIM suitability of new instances. To provide a bit of intuition about how this all works, it may help to consider the threshold dynamics of just one or two optical parametric oscillators. In the CIM architecture just described, we can think of each of the pulse time slots, circulating around the fiber ring, as are presenting an independent OPO. We can think of a single OPO degree of freedom, as a single resonant optical mode that experiences linear dissipation due to out-coupling loss, and gain in a pump near crystal, as shown in the diagram on the upper left of the slide.

 

As the pump power is increased from zero, as in the CIM algorithm, the non-linear gain is initially too low to overcome linear dissipation. And the OPO field remains in a near vacuum state. At a critical threshold value, gain equals dissipation, and the OPO undergoes a sort of lasing transition and the steady states of the OPO above this threshold are essentially coherent states. There are actually two possible values of the OPO coherent amplitude at any given above threshold pump power, which are equal in magnitude, but opposite in phase. When the OPO cross this threshold, it basically chooses one of the two possible phases randomly, resulting in the generation of a single bit of information. If we consider two uncoupled OPOs, as shown in the upper right diagram, pumped at exactly the same power at all times, then as the pump power is increased through threshold, each OPO will independently choose a phase, and thus two random bits are generated. For any number of uncoupled OPOs, the threshold power per OPO is unchanged from the single OPO case.

 

Now, however, consider a scenario, in which the two appeals are coupled to each other by a mutual injection of their out coupled fields, as shown in the diagram on the lower right. One can imagine that, depending on the sign of the coupling parameter alpha, when one OPO is lasing, it will inject a perturbation into the other that may interfere either constructively or destructively, with the field that it is trying to generate via its own lasing process. As a result, we can easily show that for alpha positive, there’s an effective ferromagnetic coupling, between the two OPO fields, and their collective oscillation threshold is lowered from that of the independent OPO case, but only for the two collective oscillation modes in which the two OPO phases are the same. For alpha negative, the collective oscillation threshold is lowered only for the configurations in which the OPO phases are opposite.

 

So then looking at how alpha is related to the Jij matrix of the Ising spin coupling hamiltonian, it follows that we could use this simplistic to OPO CIM to solve the ground state problem of the ferromagnetic or antiferromagnetic n = 2 Ising model, simply by increasing the pump power, from zero and observing what phase relation occurs, as the two OPOs first start to lase. Clearly we can imagine generalizing the story to larger n; however, the story doesn’t stay as clean and simple for all larger problem instances. And to find a more complicated example, we only need to go to n = 4. For some choices of Jij for n = 4, the story remains simple, like the n = 2 case. The figure on the upper left of this slide shows the energy of various critical points, for a non-frustrated n = 4 instance, in which the first bifurcated critical point, that is the one that bifurcates to the lowest pump value a, this first bifurcated critical point flows asymptotically into the lowest energy using solution.

 

In the figure on the upper right, however, the first bifurcated critical point flows to a very good, but suboptimal minimum at large pump power. The global minimum is actually given by a distinct critical point. The first appears at a higher pump power and is not adiabatically connected to the origin. The basic CIM algorithm is thus not able to find this global minimum. Such non-ideal behavior seems to become more common at larger n. For the n = 20 instance shown in the lower plots, where the lower right plot is just a zoom into a region of the lower left block, it can be seen that the global minimum corresponds to a critical point that first appears at a pump parameter at around 0.16, at some distance from the adiabatic trajectory of the origin. That’s curious to note that in both of these small and examples, however, the critical point corresponding to the global minimum appears relatively close to the adiabatic trajectory of the origin, as compared to most of the other, local minimum that appear. We’re currently working to characterize the face portrait typology between the global minimum and the adiabatic trajectory of the origin, taking clues as to how the basic CIM algorithm could be generalized to search for non-adiabatic trajectories that jumped to the global minimum during the pump ramp.

 

Of course, n = 20 is still too small to be of interest for practical optimization applications. But the advantage of beginning with the study of small instances is that we’re able reliably to determine their global minima and to see how they relate to the adiabatic trajectory of the origin and the basic CIM algorithm. In the small n limit, we can also analyze for the quantum mechanical models of CAM dynamics, but that’s a topic for future talks. Existing large-scale prototypes are pushing into the range of n = 10 to the four, 10 to the five, 10 to the six. So our ultimate objective in theoretical analysis really has to be, to try to say something about CIM dynamics in a regime of much larger n.

 

Our initial approach to characterizing CIM behavior in the large n regime relies on the use of random matrix theory and thus connects to prior research on spin glasses, SK models, and the tap equations, et cetera. At present we’re focusing on statistical characterization of the CIM gradient descent landscape, including the evolution of critical points and their eigen-value spectra, as the pump power is gradually increased. We’re investigating, for example, whether there could be some way to exploit differences in the relative stability of the global minimum versus other local minima. We’re also working to understand the deleterious or potentially beneficial effects of non-idealities, such as asymmetry, in the implemented Ising couplings. Looking one step ahead, we plan to move next into the direction of considering more realistic classes of problem instances, such as quadratic binary optimization with constraints.

 

So in closing I should acknowledge, people who did the hard work on these things that I’ve shown. So my group, including graduate students, Edwin Ng, Daniel Wennberg, Ryatatsu Yanagimoto, and Atsushi Yamamura have been working, in close collaboration with Surya Ganguli, Marty Fejer and Amir Safavi-Naeini. All of us within the department of applied physics, at Stanford university and also in collaboration with Yoshihisa Yamamoto, over at NTT-PHI research labs. And I should acknowledge funding support, from the NSF by the Coherent Ising Machines Expedition in Computing, also from NTT-PHI Research Labs, Army Research office, and ExxonMobil. That’s it. Thanks very much.

Coherent Nonlinear Dynamics and Combinatorial Optimization

hideo Mabuchi head shot

Hideo Mabuchi,
Professor, Applied Physics, Stanford University