## Timothee Leleu

**University of Tokyo**

Neuromorphic in Silico Simulator For the Coherent Ising Machine

Transcript of the presentation *Neuromorphic in Silico Simulator For the Coherent Ising Machine,* given at the NTT Upgrade 2020 Research Summit, September 29, 2020

Hi everyone. This is Timothee Leleu from the university of Tokyo. Before I start I would like to thank Yoshi and all the staff of NTT for the invitation, and the organization of this online meeting. And I also would like to say that it has been very exciting to see the growth of this new fine lab. And I’m happy to share with you today some of the recent works that have been done either by me or by [unintelligible] Onconoi Group in Nikkei. The title of my talk is “Neuromorphic in silico simulator for the coherent Ising machine,” and here is the outline. I would like to make the case that the simulation in digital electronics of the CIM can be useful for either better understanding or improving its function in principles by which I’ll be introducing some ideas from neural networks. This is what I will discuss in the first part. And then I will show some proof of concept of the game performance that can be obtained using this simulation, the second part. And the production of the performance that can be achieved using a very large-scale simulator in the third part. And finally talk about the future plans.

So first let me start by comparing recently proposed Ising machines using this table. There is elected from a recent Nature Electronics paper from the Hewlett-Packard people. And this comparison shows that there’s always a trade-off between energy-efficiency, speed, and scalability that depends on the physical implementation. So in red here, are the limitation of each of the servers hardware, and interestingly the FPGA-based system, such as a Fujitsu digital annealer, Toshiba bifurcation machine or I recently proposed a Resurgent Boseman machine FPGA by a group in Berkeley, they offer a good compromise between speed and scalability. And this is why despite the unique advantage that some of these older hardware have, such as the [unintelligible] influx qubits, or the energy efficiency of the memresistors. FPGA are still [unintelligible] platform for building large-scale Ising machines in the near future.

The reason for the good performance of FPGA is not so much that they operate at a high frequency nor they are particularly energy efficient, but rather that the physical wiring of its elements can be reconfigured in a way that limits the [unintelligible] human bottleneck, the larger fan-in and fan-outs, and the long propagation [unintelligible] information within the system. In this respect, the FPGA is they are interesting from the perspective of the physics of complex systems rather than the physics of the electrons on the photons.

So to put the performance of this various hardware into perspective, we can look at the computation occurring in the brain. The brain compute using billions of neurons using only 20 Watts of power and operates at a rate relatively slow frequency. And so this impressive characteristic that motivate us to try to investigate what kind of new inspired principles be useful for designing better Ising machines. The idea of this research project and the future of collaboration is to temporarily alleviate the limitations that are intrinsic to the realization of an optical quantizing machine, as shown in the top panel here, by designing a large [unintelligible] in silico, in the bottom here, that can be used for investigating the better organizing principles of the CIM.

And in this talk we’ll talk about the three neuro-inspired principles, that are: 1) The asymmetry of connections. Neurodynamics are often chaotic because of asymmetry incidents connectivity. 2) The microstructure – neuron networks are not composed of the repetition of always the same types of a [unintelligible] of the neurons, but there’s a local structure that is repeated. Here is a schematic of the micro-column in the cortex. And 3) lastly, the hierarchical organization of connectivity. Connectivity is organized in tree structure in the brain. So here you see a representation of the hierarchy and the organization of the monkey’s cerebral cortex. So how can these principles we use to improve the performance of the Ising machines and its in silico simulation?

So first about the two principles of the asymmetry and micro-structure, we know that the classical approximation of the quantizing machine, which is equivalent to the rate based on your networks. In the case of the Ising machines, the classical approximation can be obtained using the truncated Wigner representation, for example. So the dynamics on both of the systems, they are, they can be described by the following ordinary differential equations. And in which, in case of CIM, the Xi represent the in-phase component of the 1 DOPO; the f represent the nonlinear optical parts, the degenerate optical parametric amplification; and some of the [[unintelligible] iJ, J of X, J represent the coupling which is done in the case of the measurement feedback coupling CIM using homodyne detection in FPGA and then injection of the coupling term. And so these dynamics in both cases of CIM in your networks, they can be written as the grand set of the potential function V and this written here and this potential function V includes the Ising Hamiltonian.

So this is why it’s natural to use this type of dynamics to solve the Ising problem. In which [unintelligible] J of the Ising coupling and the H is the extension the Ising Hamiltonian on that state. So note that this potential function can only be defined if the Omega I Js are [unintelligible]. So the well-known program of this approach is that this potential function V that we obtain is very nonconvex at low temperature. And so one strategy is to gradually deform this landscape using some [unintelligible] process, but there’s no theorem unfortunately, that grants [unintelligible] to the global minima of Ising Hamiltonian using this approach. And so this is why we propose to introduce a micro-structure to the system, where one analog spin or one DOPO is replaced by a pair of one analog spin and one error-encoding variable.

And the addition of this micro-structure introduces asymmetry in the system, which in turns induces chaotic dynamics, a chaotic search, rather than an unending process for searching for the ground state of the Ising Hamiltonian. Within this micro-structure, the role of the error variable is to control the amplitude of the underlying spins to force the amplitude of the underlying spins to become equal to a certain target amplitude A, and this is done by modulating the strength of the Ising coupling. The error variable E i multiplied the Ising coupling here in the dynamics of a DOPO. And then the dynamics the whole dynamics described by this couple of equations because the Ei do not necessarily take all the same value for the different, i, this introduces asymmetry in the system, which in turn creates scouting dynamics which are shown here for solving a certain problem size of an SK problem, in which the X i are shown here in the E i are from here, and the value of the Ising energy is shown in the bottom plots.

And you see this chaotic search that visits various local minima of the Ising Hamiltonian, and eventually finds the global minimum. It can be shown that this modulation of the target amplitude can be used to destabilize all the local minima of the Ising Hamiltonian, so that we do not get stuck in any of them. And moreover, the other types of attractors that can eventually appear such as [unintelligible] attractors or [unintelligible] constructors, they can also be destabilized using a modulation of the target amplitude. And so we have proposed in the past two different modulation of the target amplitude. The first one is a modulation that ensure the entropy production rate of the system to become positive. And this forbids the creation of any non-trivial attractors. And, but in this work, I will talk about another modulation, a heuristic modulation, which is given here that works as well as this first modulation but is easy to be implemented on the FPGA.

So this couple of equations that represent the current the simulation of the quantizing machine with some error correction it can be implemented especially efficiently on an FPGA. And here I show the time that it takes to simulate this system. And so in red, you see the time that it takes to simulate the X i term, the E i term, the dot product and the Ising Hamiltonian, for a system with 500 spins, analog spins, equivalently 500 DOPOs. So in FPGA the nonlinear dynamics, which is equivalent to the degenerated parametric complication the DOPA of the CIM, can be completed in only 13 clock cycles at 300 megahertz, which corresponds to about 0.1 microseconds. And this is to be compared to what can be achieved in the measurements like CIM, in which if we want to get a five hundreds time multiplexed DOPOs with a one G [unintelligible] repetition rate through the optical [unintelligible] then we would require 0.5 microseconds to do this. So, the submission in FPGA can be at least as fast as one [unintelligible] rep rate post-laser CIM. Then the dot product that appears in this differential equation can be completed in 43 clock cycles that to say one microsecond at 15 megahertz.

So at least for problem sizes that are larger than 500 spins the dot product becomes clearly the bottleneck. And this can be seen by looking at the scaling of the time the numbers of clock cycles that it takes to compute either the nonlinear optical parts or the dot products, respect to the prominent size N And if we had an infinite amount of resources FPGA to simulate these dynamics then the non [unintelligible] part can could be done in a 0 1. And the metrics vector product could be done in a logarithm of scale as a logarithm of N. And why logarithm of N because a computed dot product involves summing all the terms in the dot products, which is done by an FPGA by another tree, which hides, scales logarithm [unintelligible] with the size of the system.

So this indicates, if we had an infinite amount of resources on the FPGA, but for dealing for larger problems of more than 100 spins usually, we need to decompose the metrics into a smaller blocks with the block size that are not U here, and then the scaling becomes [unintelligible] non inner parts linear in the N over U, and for the dot products in N over U square. So typically for low end FPGA, cheap FPGA, your U the block size of this matrix is typically about 100. So clearly, we want to make U as large as possible in order to maintain this scaling in log of N for the numbers of clock cycles needed to compute the dot product rather than this N square that occurs if we decompose the metrics into smaller blocks.

But the difficulty in having these larger blocks is that having another tree, a very large other tree introduces large fan-in and fan-outs and long-distance startup path within the FPGA. So the solution to get a higher performance for a simulator of the quantizing machine is to get rid of this bottleneck for the dot product by increasing the size of this other tree, and this can be done by organizing [unintelligible] the [unintelligible] within the FPGA. In order, which is shown here in this right panel here, in order to minimize the fan-in fan-outs of the system, and to minimize the long-distance startup path in the FPGA.

So I’m not going to the details of how this is implemented FPGA but just to give you an idea of why hierarchical organization of the system becomes extremely important to get good performance for simulator of the Ising machine. So instead of getting into the details of the FPGA implementation, I would like to give some few benchmark results of this simulator that was used as a proof of concept for this idea, which is can be found in this Arxiv paper here. And here, I show results for solving SK problems, pre-connected randomly, plus or minus one spin glass problems. And we show as we use as a metric, the numbers of matrix vector product since it’s the bottleneck of the computation to get the optimal solution of this SK problem with 99 success probability against the problem size here, N.

And in red here, there’s a proposed FPGA implementation. And in a blue is the numbers of matrix vector product that are registering for the CIM without error correction to solve this SK problem. And in green here for a noisy mean-field annealing, which behaves a little bit similar to the quantizing machine. And so clearly you see that the scaling of the numbers of matrix vector product is designed to solve this prime scales with a better exponents than the other approaches. So that’s an interesting feature of the of the system.

And next we can see what is the real-time to solution to solve these SK instances. So, the Y axis here is the time to solution in seconds to find a grand set of SK instances within the answers is probability for different state-of-the-art hardware. So in red is the FPGA implementation proposed this paper, and then the other curve represent a breakout local search in orange, and a certain anneal in purple, for example. And so you see that the scaling of the this proposed simulator is rather good. And that for larger plan sizes, we can get orders of magnitude faster than the state of the other approaches. Moreover, the relatively good scaling of the time to search in respect to problem size, they indicate that this FPGA implementation would be faster than other recently proposed Ising machine. Such as, the Hopfield neural networks implemented on memresistors that is very fast for small problem sized in blue here.

Which is very fast for small problem size, but which scaling is not good. And the same thing for the restricted Boseman machine implement FPGA proposal by some group in Berkeley recently, again, which is very fast for small problem sizes, but which scaling is bad. So that’s at least worst than the proposed approach, so that we can expect that for problem sizes that are larger than let’s say about 1000 spins, the proposed approach will be the faster one.

Let me jump to this other slide. And another confirmation that the scheme scales well that you can find the maximum cut values of benchmark sets, the GSET better conveyers that have been previously found by any other algorithms. So they are the best known cut values to best of our knowledge and more so which is shown in this table here in particular the instances 14 and 15 of these GSET can be, we can find better conveyers than previously known.

And we can find these conveyers a hundred times faster than the state-of-the-art algorithm in CPU to do this, which is a mechanical search. So note that to getting this a good result on the GSETs they do not require a particular hard tuning of the parameters. So the tuning instrument here is very simple. It just depends on the degree of connectivity within each graph. And so these good results on the GSET indicate that the proposed approach would be a good not only at solving SK problems in those problems, but all the types of graphs, sizing problems, or max cut problems [unintelligible].

So given that the performance of the FPGA design depends on the height of this other tree, we can try to maximize the height of this other tree on the large FPGA. And by carefully rooting the integral components within the FPGA and we can draw some projections of what type of performance we can achieve in the near future based on the implementation that we are currently working on. So here you see projections for the time to solution in 99 percent probability for solving these SK problems, with respect to the problem size N here compared to different recent proposed Ising machines in particular the digital annular which is shown in the green here, the green line without dots. And we show two different hypotheses for this production, either that the time to solution scales as exponential of N or the time to solution scales as expression of square root of N.

So, it seems according to the data time to solution scales more as an, expression of square root of N, so we can be sure of this. And this projection shows that we probably can solve prior SK problem of size 2000 spins to find the real grand state of this problem with 99 success quality in about 10 seconds, which is much faster than all the other proposed approaches.

So what are the future plans for this quantizing machine simulator? So the first thing is that we would like to make the simulation closer to the real DOPO optical system. In particular, so for a first step to get closer to the system of a measurement feedback CIM and to do this, what is simulatable on the FPGA? Is this a quantum Gaussian model that is a proposed described in this paper and proposed by people in the NTT group? And so the idea of this model is that instead of having the very simple ODEs that I’ve shown previously, it includes paired ODEs that take into account not only the mean of, the all-sum average of the DOPO [unintelligible] component, but also their variance. So that we can take into account more quantum effects of the DOPOs, such as the squeezing.

And then we plan to make the simulator open access for the members to run their instances on the system. So there will be a first version in September that will be just based on the simple, common line access for the simulator, and in which we’ll have just a classical approximation of the system [unintelligible] binary weights and no Zeeman term. And but then we’ll propose a second version that would extend the current Ising machine to a rack of eight FPGA in which we will add more refined models, Truncated Wigner in the bottom Gaussian model that I just talked about and the supports in which the valued weights for the Ising problems and support Zeeman term. So we will announce later when this is available and Farad is working hard to get the first version available sometime in September. Thank you all. And we’ll be happy to answer any questions that you have.

Neuromorphic in Silico Simulator For the Coherent Ising Machine