The problem, matrix multiplication, is a crucial type of calculation at the heart of many different applications, from displaying images on a screen to simulating complex physics. It is also fundamental to machine learning itself. Speeding up this calculation could have a big impact on thousands of everyday computer tasks, cutting costs and saving energy.
“This is a really amazing result,” says François Le Gall, a mathematician at Nagoya University in Japan, who was not involved in the work. “Matrix multiplication is used everywhere in engineering,” he says. “Anything you want to solve numerically, you typically use matrices.”
Despite the calculation’s ubiquity, it is still not well understood. A matrix is simply a grid of numbers, representing anything you want. Multiplying two matrices together typically involves multiplying the rows of one with the columns of the other. The basic technique for solving the problem is taught in high school. “It’s like the ABC of computing,” says Pushmeet Kohli, head of DeepMind’s AI for Science team.
But things get complicated when you try to find a faster method. “Nobody knows the best algorithm for solving it,” says Le Gall. “It’s one of the biggest open problems in computer science.”
This is because there are more ways to multiply two matrices together than there are atoms in the universe (10 to the power of 33, for some of the cases the researchers looked at). “The number of possible actions is almost infinite,” says Thomas Hubert, an engineer at DeepMind.
The trick was to turn the problem into a kind of three-dimensional board game, called TensorGame. The board represents the multiplication problem to be solved, and each move represents the next step in solving that problem. The series of moves made in a game therefore represents an algorithm.
The researchers trained a new version of AlphaZero, called AlphaTensor, to play this game. Instead of learning the best series of moves to make in Go or chess, AlphaTensor learned the best series of steps to make when multiplying matrices. It was rewarded for winning the game in as few moves as possible.
“We transformed this into a game, our favorite kind of framework,” says Hubert, who was one of the lead researchers on AlphaZero.
The researchers describe their work in a paper published in Nature today. The headline result is that AlphaTensor discovered a way to multiply together two four-by-four matrices that is faster than a method devised in 1969 by the German mathematician Volker Strassen, which nobody had been able to improve on since. The basic high school method takes 64 steps; Strassen’s takes 49 steps. AlphaTensor found a way to do it in 47 steps.
Overall, AlphaTensor beat the best existing algorithms for more than 70 different sizes of matrix. It reduced the number of steps needed to multiply two nine-by-nine matrices from 511 to 498, and the number required for multiplying two 11-by-11 matrices from 919 to 896. In many other cases, AlphaTensor rediscovered the best existing algorithm.
The researchers were surprised by how many different correct algorithms AlphaTensor found for each size of matrix. “It’s mind-boggling to see that there are at least 14,000 ways of multiplying four-by-four matrices,” says Hussein Fawzi, a research scientist at DeepMind.
Having looked for the fastest algorithms in theory, the DeepMind team then wanted to know which ones would be fast in practice. Different algorithms can run better on different hardware because computer chips are often designed for specific types of computation. The DeepMind team used AlphaTensor to look for algorithms that were tailored to Nvidia V100 GPU and Google TPU processors, two of the most common chips used for training neural networks. The algorithms that they found were 10 to 20% faster at matrix multiplication than those typically used with those chips.
Virginia Williams, a computer scientist at MIT’s Computer Science and Artificial Intelligence Laboratory, is excited by the results. She notes that people have used computational approaches to find new algorithms for matrix multiplication for some time—and many of the existing fastest algorithms were devised in this way. But none were able to improve on long-standing results like Strassen’s.
“This new method does something completely different from what the others did,” says Williams. “It would be nice to figure out whether this new method actually subsumes all the previous ones, or whether you can combine them and get something even better.”
DeepMind now plans to use AlphaTensor to look for other types of algorithms. “It’s a new way of doing computer science,” says Kohli.