While machine learning has been around a long time, deep learning has taken on a life of its own lately. The reason for that has mostly to do with the increasing amounts of computing power that have become widely available—along with the burgeoning quantities of data that can be easily harvested and used to train neural networks.
The amount of computing power at people’s fingertips started growing in leaps and bounds at the turn of the millennium, when graphical processing units (GPUs) began to be
harnessed for nongraphical calculations, a trend that has become increasingly pervasive over the past decade. But the computing demands of deep learning have been rising even faster. This dynamic has spurred engineers to develop electronic hardware accelerators specifically targeted to deep learning, Google’s Tensor Processing Unit (TPU) being a prime example.
Here, I will describe a very different approach to this problem—using optical processors to carry out neural-network calculations with photons instead of electrons. To understand how optics can serve here, you need to know a little bit about how computers currently carry out neural-network calculations. So bear with me as I outline what goes on under the hood.
Almost invariably, artificial neurons are constructed using special software running on digital electronic computers of some sort. That software provides a given neuron with multiple inputs and one output. The state of each neuron depends on the weighted sum of its inputs, to which a nonlinear function, called an activation function, is applied. The result, the output of this neuron, then becomes an input for various other neurons.
Reducing the energy needs of neural networks might require computing with light
For computational efficiency, these neurons are grouped into layers, with neurons connected only to neurons in adjacent layers. The benefit of arranging things that way, as opposed to allowing connections between any two neurons, is that it allows certain mathematical tricks of linear algebra to be used to speed the calculations.
While they are not the whole story, these linear-algebra calculations are the most computationally demanding part of deep learning, particularly as the size of the network grows. This is true for both training (the process of determining what weights to apply to the inputs for each neuron) and for inference (when the neural network is providing the desired results).
What are these mysterious linear-algebra calculations? They aren’t so complicated really. They involve operations on
matrices, which are just rectangular arrays of numbers—spreadsheets if you will, minus the descriptive column headers you might find in a typical Excel file.
This is great news because modern computer hardware has been very well optimized for matrix operations, which were the bread and butter of high-performance computing long before deep learning became popular. The relevant matrix calculations for deep learning boil down to a large number of multiply-and-accumulate operations, whereby pairs of numbers are multiplied together and their products are added up.
Over the years, deep learning has required an ever-growing number of these multiply-and-accumulate operations. Consider