Latest from MIT Tech Review – AI-generated images can teach robots how to act

Generative AI models can produce images in response to prompts within seconds, and they’ve recently been used for everything from highlighting their own inherent bias to preserving precious memories.

Now, researchers from Stephen James’s Robot Learning Lab in London are using image-generating AI models for a new purpose: creating training data for robots. They’ve developed a new system, called Genima, that fine-tunes the image-generating AI model Stable Diffusion to draw robots’ movements, helping guide them both in simulations and in the real world. The research is due to be presented at the Conference on Robot Learning (CoRL) next month.

The system could make it easier to train different types of robots to complete tasks—machines ranging from mechanical arms to humanoid robots and driverless cars. It could also help make AI web agents, a next generation of AI tools that can carry out complex tasks with little supervision, better at scrolling and clicking, says Mohit Shridhar, a research scientist specializing in robotic manipulation, who worked on the project.

“You can use image-generation systems to do almost all the things that you can do in robotics,” he says. “We wanted to see if we could take all these amazing things that are happening in diffusion and use them for robotics problems.”

To teach a robot to complete a task, researchers normally train a neural network on an image of what’s in front of the robot. The network then spits out an output in a different format—the coordinates required to move forward, for example.

Genima’s approach is different because both its input and output are images, which is easier for the machines to learn from, says Ivan Kapelyukh, a PhD student at Imperial College London, who specializes in robot learning but wasn’t involved in this research.

“It’s also really great for users, because you can see where your robot will move and what it’s going to do. It makes it kind of more interpretable, and means that if you’re actually going to deploy this, you could see before your robot went through a wall or something,” he says.

Genima works by tapping into Stable Diffusion’s ability to recognize patterns (knowing what a mug looks like because it’s been trained on images of mugs, for example) and then turning the model into a kind of agent—a decision-making system.

MOHIT SHRIDHAR, YAT LONG (RICHIE) LO, STEPHEN JAMES ROBOT LEARNING LAB

First, the researchers fine-tuned stable Diffusion to let them overlay data from robot sensors onto images captured by its cameras.

The system renders the desired action, like opening a box, hanging up a scarf, or picking up a notebook, into a series of colored spheres on top of the image. These spheres tell the robot where its joint should move one second in the future.

The second part of the process converts these spheres into actions. The team achieved this by using another neural network, called ACT, which is mapped on the same data. Then they used Genima to complete 25 simulations and nine real-world manipulation tasks using a robot arm. The average success rate was 50% and 64%, respectively.

Although these success rates aren’t particularly high, Shridhar and the team are optimistic that the robot’s speed and accuracy can improve. They’re particularly interested in applying Genima to video-generation AI models, which could help a robot predict a sequence of future actions instead of just one.

The research could be particularly useful for training home robots to fold laundry, close drawers, and other domestic tasks. However, its generalized approach means it’s not limited to a specific kind of machine, says Zoey Chen, a PhD student at the University of Washington, who has also previously used Stable Diffusion to generate training data for robots but was not involved in this study.

“This is a really exciting new direction,” she says. “I think this can be a general way to train data for all kinds of robots.”

Latest from Google AI – Learning to Prompt for Continual Learning

Posted by Zifeng Wang, Student Researcher, and Zizhao Zhang, Software Engineer, Google Research Supervised learning is a common approach to machine learning (ML) in which the model is trained using data that is labeled appropriately for the task at hand. Ordinary supervised learning trains on independent and identically distributed (IID) data, where all training examples…

Artificial Intelligence

Latest from MIT Tech Review – Inside a radical new project to democratize AI

PARIS — This is as close as you can get to a rock concert in AI research. Inside the supercomputing center of the French National Center for Scientific Research, on the outskirts of Paris, rows and rows of what look like black fridges hum at a deafening 100 decibels. They form part of a supercomputer…

Artificial Intelligence

Latest from MIT Tech Review – From pilot to scale: Making agentic AI work in health care

Over the past 20 years building advanced AI systems—from academic labs to enterprise deployments—I’ve witnessed AI’s waves of success rise and fall. My journey began during the “AI Winter,” when billions were invested in expert systems that ultimately underdelivered. Flash forward to today: large language models (LLMs) represent a quantum leap forward, but their prompt-based…

Artificial Intelligence

Latest from MIT : New control system teaches soft robots the art of staying safe

Imagine having a continuum soft robotic arm bend around a bunch of grapes or broccoli, adjusting its grip in real time as it lifts the object. Unlike traditional rigid robots that generally aim to avoid contact with the environment as much as possible and stay far away from humans for safety reasons, this arm senses…

Artificial Intelligence

Latest from MIT Tech Review – A new tool for copyright holders can show if their work is in AI training data

Since the beginning of the generative AI boom, content creators have argued that their work has been scraped into AI models without their consent. But until now, it has been difficult to know whether specific text has actually been used in a training data set. Now they have a new way to prove it: “copyright…

Artificial Intelligence

Latest from MIT Tech Review – How AI taught Cassie the two-legged robot to run and jump

If you’ve watched Boston Dynamics’ slick videos of robots running, jumping and doing parkour, you might have the impression robots have learned to be amazingly agile. In fact, these robots are still coded by hand, and would struggle to deal with new obstacles they haven’t encountered before. However, a new method of teaching robots to…

Similar Posts