DeepMind’s streak of applying its world-class AI to hard science problems continues. In collaboration with the Swiss Plasma Center at EPFL—a university in Lausanne, Switzerland—the UK-based AI firm has now trained a deep reinforcement learning algorithm to control the superheated soup of matter inside a nuclear fusion reactor. The breakthrough, published in the journal Nature, could help physicists better understand how fusion works, and potentially speed up the arrival of an unlimited source of clean energy.
“This is one of the most challenging applications of reinforcement learning to a real-world system,” says Martin Riedmiller, a researcher at DeepMind.
In nuclear fusion, the atomic nuclei of hydrogen atoms get forced together to form heavier atoms, like helium. This produces a lot of energy relative to a tiny amount of fuel, making it a very efficient source of power. It is far cleaner and safer than fossil fuels or conventional nuclear power, which is created by fission—forcing nuclei apart. It is also the process that powers stars.
Controlling nuclear fusion on Earth is hard, however. The problem is that atomic nuclei repel each other. Smashing them together inside a reactor can only be done at extremely high temperatures, often reaching hundreds of millions of degrees—hotter than the center of the sun. At these temperatures, matter is neither solid, liquid, nor gas. It enters a fourth state, known as plasma: a roiling, superheated soup of particles.
The task is to hold the plasma inside a reactor together long enough to extract energy from it. Inside stars, plasma is held together by gravity. On Earth, researchers use a variety of tricks, including lasers and magnets. In a magnet-based reactor, known as a tokamak, the plasma is trapped inside an electromagnetic cage, forcing it to hold its shape and stopping it from touching the reactor walls, which would cool the plasma and damage the reactor.
Controlling the plasma requires constant monitoring and manipulation of the magnetic field. The team trained its reinforcement-learning algorithm to do this inside a simulation. Once it had learned how to control—and change—the shape of the plasma inside a virtual reactor, the researchers gave it control of the magnets in the Variable Configuration Tokamak (TCV), an experimental reactor in Lausanne. They found that the AI was able to control the real reactor without any additional fine-tuning. In total, the AI controlled the plasma for only two seconds—but this is as long as the TCV reactor can run before getting too hot.
Quick reactions
Ten thousand times a second, the trained neural network takes in 90 different measurements describing the shape and position of the plasma and adjusts the voltage in 19 magnets in response. This feedback loop is far faster than previous reinforcement-learning algorithms have had to deal with. To speed things up, the AI was split into two neural networks. A large network, called a critic, learned via trial and error how to control the reactor inside the simulation. The critic’s ability was then encoded in a smaller, faster network, called an actor, that runs on the reactor itself.
“It’s an incredibly powerful method,” says Jonathan Citrin at the Dutch Institute for Fundamental Energy Research, who was not involved in the work. “It’s an important first step in a very exciting direction.”
The researchers believe that using AI to control plasma will make it easier to experiment with different conditions inside reactors, helping them understand the process and potentially speeding up the development of commercial nuclear fusion. The AI also learned how to control the plasma by adjusting magnets in a way that humans had not tried before, which suggests that there may be new reactor configurations to explore.
“We can take risks with this kind of control system that we wouldn’t dare take otherwise,” says Ambrogio Fasoli, director of the Swiss Plasma Center and chair of the Eurofusion Consortium. Human operators are often unwilling to push the plasma beyond certain limits. “There are events that we absolutely have to avoid because they damage the device,” he says. “If we are sure that we have a control system that takes us close to the limits but not beyond them, then we can explore more possibilities. We can accelerate research.”