Latest from MIT Tech Review – Google DeepMind’s new generative model makes Super Mario-like games from scratch

OpenAI’s recent reveal of its stunning generative model Sora pushed the envelope of what’s possible with text-to-video. Now Google DeepMind brings us text-to-video games.

The new model, called Genie, can take a short description, a hand-drawn sketch or a photo and turn it into a playable video game in the style of classic 2D platformers like Super Mario Bros. But don’t expect anything fast-paced. The games run at one frame per second, compared to the typical 30-60 frames per second of most modern games.

“It’s cool work,” says Matthew Gudzial, an AI researcher at the University of Alberta, who developed a similar game generator a few years ago.

Genie was trained on 30,000 hours of video of hundreds of 2D platform games taken from the internet. Others have taken that approach before, says Gudzial. His own game generator learned from videos to create abstract platformers. Nivida used video data to train a model called GameGAN, which could produce clones of games like Pac-Man.

But all of these examples trained the model with input actions, button presses on a games controller, as well as video footage: a video frame showing Mario jumping was paired with the “jump” action, and so on. Tagging video footage with input actions takes a lot of work, however. This has limited the amount of training data available.

In contrast, Genie was trained on video footage alone. It then learned which of eight possible actions would cause the game character in a video to change its position. This turned countless hours of existing online video into potential training data.

Genie can generate simple games from hand-drawn sketches

GOOGLE DEEPMIND

Genie generates each new frame of the game on the fly depending on the action the player takes. Press jump and Genie updates the current image to show the game character jumping; press left and the image changes to show the character moved to the left. The game ticks along action by action, each new frame generated from scratch as the player plays.

Future versions of Genie could run faster. “There is no fundamental limitation that prevents us from reaching 30 frames per second,” says Tim Rocktäschel, a research scientist at Google DeepMind who leads the team behind the work. “Genie uses many of the same technologies as contemporary large language models, where there has been significant progress in improving inference speed.”

Genie learned some common visual quirks found in platformers. Many games of this type use parallax, where the foreground moves sideways faster than the background. Genie often adds this effect to the games it generates.

While Genie is an in-house research project and won’t be released, Gudzial notes that the Google DeepMind team says it could one day be turned into a game-making tool—something he’s working on too. “I’m definitely interested to see what they build,” he says.

Virtual playgrounds

But the Google DeepMind researchers are interested in more than just game generation, however. The team behind Genie works on open-ended learning, where AI-controlled bots are dropped into a virtual environment and left to learn how to solve various tasks by trial and error (a technique known as reinforcement learning).

In 2021, the team developed a virtual playground called XLand, in which bots learned how to cooperate to solve simple tasks such as moving obstacles. Virtual environments like XLand will be crucial for training future bots on a range of different challenges before pitting them against real-world scenarios. The video game example proves that Genie can produce these virtual sandboxes for bots to play in.

Others have developed similar world-building tools. For example, David Ha at Google Brain and Jürgen Schmidhuber at the AI lab IDSIA in Switzerland developed a tool in 2018 that trained bots in game-based virtual environments called world models. But, again, unlike Genie, these required the training data to include input actions.

The team demonstrated how this ability is useful in robotics too. By showing Genie videos of real robot arms manipulating a variety of household objects, the model learned what actions that arm could do and how to control it. Future robots could learn new tasks by watching video tutorials.

“It is hard to predict what use cases will be enabled,” says Rocktäschel. “We hope projects like Genie will eventually provide people with new tools to express their creativity.”

Latest from MIT : MIT Department of Economics to launch James M. and Cathleen D. Stone Center on Inequality and Shaping the Future of Work

Starting in July, MIT’s Shaping the Future of Work Initiative in the Department of Economics will usher in a significant new era of research, policy, and education of the next generation of scholars, made possible by a gift from the James M. and Cathleen D. Stone Foundation. In recognition of the gift and the expansion…

Artificial Intelligence

Latest from MIT Tech Review – Now you can chat with ChatGPT using your voice

In one of the biggest updates to ChatGPT yet, OpenAI has launched two new ways to interact with its viral app. First, ChatGPT now has a voice. Choose from one of five lifelike synthetic voices and you can have a conversation with the chatbot as if you were making a call, getting responses to your…

Artificial Intelligence

Latest from Google AI – Re-weighted gradient descent via distributionally robust optimization

Ramnath Kumar, Pre-Doctoral Researcher, and Arun Sai Suggala, Research Scientist, Google Research Deep neural networks (DNNs) have become essential for solving a wide range of tasks, from standard supervised learning (image classification using ViT) to meta-learning. The most commonly-used paradigm for learning DNNs is empirical risk minimization (ERM), which aims to identify a network that…

Artificial Intelligence

Latest from MIT : Training LLMs to self-detoxify their language

As we mature from childhood, our vocabulary — as well as the ways we use it — grows, and our experiences become richer, allowing us to think, reason, and interact with others with specificity and intention. Accordingly, our word choices evolve to align with our personal values, ethics, cultural norms, and views. Over time, most…

Artificial Intelligence

Latest from Google AI – Alternating updates for efficient transformers

Posted by Xin Wang, Software Engineer, and Nishanth Dikkala, Research Scientist, Google Research Contemporary deep learning models have been remarkably successful in many domains, ranging from natural language to computer vision. Transformer neural networks (transformers) are a popular deep learning architecture that today comprise the foundation for most tasks in natural language processing and also…

Artificial Intelligence

O’Reilly Media – A “Beam Versus Dataflow” Conversation

I’ve been in a few recent conversations about whether to use Apache Beam on its own or run it with Google Dataflow. On the surface, it’s a tooling decision. But it also reflects a broader conversation about how teams build systems. Beam offers a consistent programming model for unifying batch and streaming logic. It doesn’t…

Virtual playgrounds

Similar Posts