OpenAI’s recent reveal of its stunning generative model Sora pushed the envelope of what’s possible with text-to-video. Now Google DeepMind brings us text-to-video games.

The new model, called Genie, can take a short description, a hand-drawn sketch or a photo and turn it into a playable video game in the style of classic 2D platformers like Super Mario Bros. But don’t expect anything fast-paced. The games run at one frame per second, compared to the typical 30-60 frames per second of most modern games.

“It’s cool work,” says Matthew Gudzial, an AI researcher at the University of Alberta, who developed a similar game generator a few years ago. 

Genie was trained on 30,000 hours of video of hundreds of 2D platform games taken from the internet. Others have taken that approach before, says Gudzial. His own game generator learned from videos to create abstract platformers. Nivida used video data to train a model called GameGAN, which could produce clones of games like Pac-Man.

But all of these examples trained the model with input actions, button presses on a games controller, as well as video footage: a video frame showing Mario jumping was paired with the “jump” action, and so on. Tagging video footage with input actions takes a lot of work, however. This has limited the amount of training data available. 

In contrast, Genie was trained on video footage alone. It then learned which of eight possible actions would cause the game character in a video to change its position. This turned countless hours of existing online video into potential training data. 

Genie can generate simple games from hand-drawn sketches
GOOGLE DEEPMIND

Genie generates each new frame of the game on the fly depending on the action the player takes. Press jump and Genie updates the current image to show the game character jumping; press left and the image changes to show the character moved to the left. The game ticks along action by action, each new frame generated from scratch as the player plays. 

Related work from others:  Latest from MIT Tech Review - Three things we learned about AI from EmTech Digital London

Future versions of Genie could run faster. “There is no fundamental limitation that prevents us from reaching 30 frames per second,” says Tim Rocktäschel, a research scientist at Google DeepMind who leads the team behind the work. “Genie uses many of the same technologies as contemporary large language models, where there has been significant progress in improving inference speed.” 

Genie learned some common visual quirks found in platformers. Many games of this type use parallax, where the foreground moves sideways faster than the background. Genie often adds this effect to the games it generates.  

While Genie is an in-house research project and won’t be released, Gudzial notes that the Google DeepMind team says it could one day be turned into a game-making tool—something he’s working on too. “I’m definitely interested to see what they build,” he says.

Virtual playgrounds

But the Google DeepMind researchers are interested in more than just game generation, however. The team behind Genie works on open-ended learning, where AI-controlled bots are dropped into a virtual environment and left to learn how to solve various tasks by trial and error (a technique known as reinforcement learning). 

In 2021, the team developed a virtual playground called XLand, in which bots learned how to cooperate to solve simple tasks such as moving obstacles. Virtual environments like XLand will be crucial for training future bots on a range of different challenges before pitting them against real-world scenarios. The video game example proves that Genie can produce these virtual sandboxes for bots to play in.

Others have developed similar world-building tools. For example, David Ha at Google Brain and Jürgen Schmidhuber at the AI lab IDSIA in Switzerland developed a tool in 2018 that trained bots in game-based virtual environments called world models. But, again, unlike Genie, these required the training data to include input actions. 

Related work from others:  Latest from MIT : Is AI in the eye of the beholder?

The team demonstrated how this ability is useful in robotics too. By showing Genie videos of real robot arms manipulating a variety of household objects, the model learned what actions that arm could do and how to control it. Future robots could learn new tasks by watching video tutorials.  

“It is hard to predict what use cases will be enabled,” says Rocktäschel. “We hope projects like Genie will eventually provide people with new tools to express their creativity.”

Similar Posts