Dirk Englund

Associate Professor of Electrical Engineering and Computer Science at MIT

Photonic Accelerators for Machine Intelligence

Transcript of the presentation Photonic Accelerators for Machine Intelligence, given at the NTT Upgrade 2020 Research Summit, September 29, 2020

Hi, my name is Dirk Englund, and I am an Associate Professor of Electrical Engineering and Computer Science at MIT. It’s been fantastic to be part of this team that Professor Yamamoto put together, for the NTT PHI program, and it’s a great pleasure to report to you, our update from the first year. I will talk to you today about our recent work in Photonic Accelerators for Machine Intelligence. And you can already get a flavor of the kind of work that I’ll be presenting from the photonic integrated circuit at services, photonic matrix processor that we are developing to try to break some of the bottlenecks that we encounter in inference machine learning tasks, in particular tasks like mission games control or language processing.

This work is jointly led with Dr. Ryan Hamerly, scientist at NTT research, and he will have a poster that you should check out in this conference. Should also say that there are postdoc positions available. Just take a look at the announcements on QPLAB.mit.edu. So, if you look at these machine learning applications, look under the hood, you see that common feature is that they use these artificial neural networks or ANNs, where you have an input layer of let’s say N neurons and values that is connected to the first layer of let’s say also N neurons and connecting the first and second layer would if you represented it by a matrix require an N x N matrix that has of order and squared free parameters.

Okay. Now, in traditional machine learning inference, you would have to grab these N squared values from memory and then every time you do that it costs quite a lot of energy. Okay, maybe you can batch, but it’s still quite costly in energy. And moreover, each of the input values has to be multiplied by that matrix, and if you multiply an N by one vector by an N square matrix you have to do a border N squared multiplications. Okay. Now, on a digital computer, you therefore have to do a border N squared operations and memory access, which can be quite costly. But the proposition is that on photonic integrated circuits, perhaps, we could do that matrix vector multiplication directly on the PIC itself, by encoding optical fields and sending them through a programmed, programmed interferometer and the output then would be a product of the matrix multiplied by the input vector.

And that is actually the experiment we did, demonstrating that this is in principle possible back in 2017 in a collaboration with Professor Marin Soljacic. Now if you look a little bit more closely at the device as shown here, this consists of a silicon layer that is patterned into wave guides. We do this with Foundry. This was fabricated with the Opsis Foundry and many thanks to our collaborators who’ve helped make that possible. And, and this thing guides light and out of these wave guides who make these 2 x 2 transformations, Mach-Zehnder interferotometers as they’re called, input two input wave guides coming in two output wave guides going out, and by having two phase settings here theta and phi, we can control any arbitrary SU (2) rotation.

Now if I want to have n modes coming in, n modes coming out that can be represented by an SU (N) unitary transformation. And that’s what this kind of chip allows you to do. And that’s the key ingredient that really launched this in, in, in my group. I should, at this point acknowledge the people who’ve made this possible, n particular point out Liane Bernstein and Alex Sludds as well as Ryan Hamerly once more. Also these other collaborators, Prabhu, most importantly Marin Soljacic . And of course, to our funding, in particular you know the NTT Research funding.

So why optics? Optics has failed many times before in building computers, but why is this different? And I think the difference is that we know, you know, we’re not trying to build an entirely new computer out of optics. We’re selective in how we apply optics. We should use optics for what it’s good at. And that’s probably not so much the non-linearity necessarily. I mean not, not memory.

Communication and fan-out are great in optics. And as we just said, linear algebra, you can do an optics. Fantastic. Okay, so you should make use of these things and then combine it judiciously with electronic processing to see if you can get an advantage in the entire system out of it. Okay. And, and so before I move on actually based on the 2017 paper, two startups were created Lightelligence and Lightmatter. And the two students from my group, Nick Harris and Darius Pundahar co-started this jointly founded by Manor.

And just after, you know, after like about two years, they’ve been able to create their first device, the first matrix, large-scale processor. This this device, called Mars, has a 64 input modes 64 output modes and the full programmability under the hood. Okay. So because they’re integrating wave guides directly with CMOS electronics they were able to get all the wiring complexity dealt with, all the feedback and so forth, and this device is now able to just process a 64 by 64 unitary matrix on the fly. Okay. Parameters are three Watts total power consumption. It has a latency, how long it takes for a matrix to be multiplied by a vector, of less than a nanosecond.

And because this device works well over a pretty large band of 20 gigahertz you can just put many channels that are individually at one gigahertz. So you can have tens of SU 2, SU 64 by 64 rotations done simultaneously. That would be, do the sort of back of the envelope physics, gives you that per-multiply accumulate. You have just tens of [unintelligible] tools at the moment. So that’s very, very competitive. That’s. That’s awesome. Okay. So you see potentially the breakthroughs that are enabled by photonics here and actually more recently they, Oh actually one thing that made it possible, it’s very cool, is that these phase shifters actually have no hold power. Whereas our phasers still use domo modulation, these use nanoscale, mechanical modulators that have no hold power. So once you program a unitary, you could just hold it there. No energy consumption added over time.

So photonics really is on the rise and computing. And, but once again, you have to be, you have to be careful in how you compare against electronics to find where is the gain to be hard. So what I’ve talked so far about is weight-stationary photonic processing. Okay. Up until here. Now electronics has that also, but it doesn’t have the benefits of the coherence of optical fields transitioning through this through this, through this matrix, nor the bandwidth. Okay. So that’s a thought is I think a really exciting direction and these companies are off and they’re, they’re building these chips, and we’ll see in the next couple of months how well this works.

On the different direction is to have an output stationary metric specter multiplication. And for this I want to point to this paper we wrote with Ryan Hamerly and the other team members that projects the activation functions together with the weight terms onto a detector array. And by the interference of the activation function and the wait term, by homodyne detection, it’s possible if you think about homodyne detection, that it actually automatically produces the multiplication interference turn between two optical fields gives you the multiplication between them. And so that’s what that is making use of.

And I want to talk a bit more about that approach. So we actually did a careful analysis in that PRX paper that was cited in the last page and, and that analysis of the energy consumption show that this device in principle can compute at an, at an energy per multiply accumulate, that is below what you could theoretically do at room temperature using an irreversible computer, like our digital computers that we use in everyday life.

So I want to illustrate that you can see that from this plot here, what this is showing us the number of neurons that you have per layer. And on the vertical axis is the energy per multiply accumulate in terms of joules. And when we make use of the massive fan out together with this photoelectric multiplication by coherent detection, we estimate that we’re on this curve here. So the more, right, so since our energy consumption scales as n, whereas for a, for a digital computer, or it goes as n squared we, we gain more as you go to larger matrices. So for largest matrices, like matrices of scale 1,000 x 1000, even with present day technology we estimate that we would hit an energy per multiply accumulate of about a femtojoule.

Okay. But if we, you look at if we imagine a photonic device that uses a photonic system that uses devices that have already been demonstrated individually but not packaged in large system you know, individually in research papers, we would be on this curve here where you would very quickly dip underneath the Laundauer limit, which corresponds to the thermodynamic limit for doing as many bit operations that you would have to do to do the same depth of neural network as we do here. And I should say that all of these numbers were computed for this simulated optical neural network for having the equivalent error rate that a fully digital computer that a digital computer would have, and so equivalent in the error rate.

So, limited in the error by the model itself rather than the imperfections of the devices. Okay. And we’ve benchmarked that on the [unintelligible] data set. So that was a theoretical work that note at the scaling limits and showed that there’s quite great hope to, to really gain tremendously in the energy per bit. But also in the overall latency and throughput, but you shouldn’t celebrate too early. You have to really do a careful system level study comparing electronic approaches, which oftentimes have an analogous approach, to optical approaches. And we did that in a first major step in this digital optical neural network study here which was done together with Vivienne Sze, who is an electronics designer, who actually works on electronics based on CMOS, specifically made for machine learning acceleration, and professor Joel Emer of MIT, who is also an Intel Fellow at Nvidia.

And, and what we, what we studied there in particular is what if we just replaced only the communication part with optics. Okay. And we looked at, you know getting the same equivalent error rates that you would have over the electronic computer. And that showed that, that we might, we should have a, a benefit for large neural networks because large neural networks will require lots of communication that eventually do not fit on a single electronic chip anymore. At that point, you’d have to go longer distances. And that’s where the optical connection start to win out. So for details, I would like to point you to that sophisticated study, but we’re now applying more sophisticated studies like this, like that simulation, full-system simulation to our other optical networks to really see where the benefits that we might have where we can exploit these.

Now, lastly, I want to just say what, what if we had non-linearities that were actually reversible, that were quantum coherent, in fact. And we looked at that, so suppose they have the same architecture layout but rather than having like a saturable absorption or a photo detection and an electronic non-linearity which is what we’ve done so far, you have an all-optical nonlinearity. Okay. Based for example, on a curve medium. So suppose that we had like a strong enough curve medium so that the output from one of these transformations can pass through it, get an intensity dependent phase shift and then passes into the next layer. Okay. What we did in this case is we said okay, suppose that you have this, you have multiple layers of these Mach-Zehnder interferotometer measures. Okay. These are just like the ones that we had before, and you want to train this to do something. So suppose that training is, for example, quantum optical state compression. Okay. You have an quantum optical state. You’d like to see how much can I compress that to have the same quantum information in it. Okay. And we trained that to discover an efficient algorithm for that. We also trained it for reinforcement learning for Blackbox quantum simulation and what, you know, what is particularly interesting perhaps in the near term for one-way quantum repeaters. So we said if we have a communication network that has these quantum optical neural networks stationed some distance away, you come in with an optical encoded pulse that encodes an optical cubit and too many individual photons, how do I repair that multi-photon state to send them the corrected optical state out the other side?

This is a one-way error correcting scheme. We didn’t know how to build it, but we put it as a challenge to the neural network. And we trained and, you know, in simulation, we trained the neural network, how to apply the weights and the matrix transformations to perform that and answering actually a challenge in the field of optical quantum networks. So that gives us motivation to try to build these kinds of non-linearities. And we’ve done a fair amount of work in this. You can see references five through seven here, I’ve talked about these programmable, photonics already for the benchmark analysis and some of the other related work.

Please see Ryan’s poster. We have, as I mentioned, where we have ongoing work in benchmarking, optical computing as part of the NTT program with our collaborators. And I think that the main thing that I want to say here you know, at the end is, that the exciting thing really is that the physics tells us that there are many orders of magnitude of efficiency gains that are to be had. If we, you know, if we can develop the technology to realize it. I was being conservative here with three orders of magnitude, this could be six orders of magnitude for larger neural networks that we may have to use and that we may want to use in the future.

So the physics tells us there there’s like a tremendous amount of gap between where we are and where we could be. And that I think makes us tremendously exciting and makes the NTT PHI project so very timely. So with that, you know, thank you for your attention and I’ll be happy to talk about any of these topics.

Photonic Accelerators for Machine Intelligence

dirk_englund 2 (002)

Dirk Englund,
Associate Professor of Electrical Engineering and Computer Science at MIT