Teaching robots to navigate new environments is tough. You can train them on physical, real-world data taken from recordings made by humans, but that’s scarce, and expensive to collect. Digital simulations are a rapid, scalable way to teach them to do new things, but the robots often fail when they’re pulled out of virtual worlds and asked to do the same tasks in the real one. 

Now, there’s potentially a better option: a new system that uses generative AI models in conjunction with a physics simulator to develop virtual training grounds that more accurately mirror the physical world. Robots trained using this method worked with a higher success rate than those trained using more traditional techniques during real-world tests. 

Researchers used the system, called LucidSim, to train a robot dog in parkour, getting it to scramble over a box and climb stairs, despite never seeing any real world data. The approach demonstrates how helpful generative AI could be when it comes to teaching robots to do challenging tasks. It also raises the possibility that we could ultimately train them in entirely virtual worlds. The research was presented at the Conference on Robot Learning (CoRL) last week.

“We’re in the middle of an industrial revolution for robotics,” says Ge Yang, a postdoc scholar at MIT CSAIL who worked on the project. “This is our attempt at understanding the impact of these [generative AI] models outside of their original intended purposes, with the hope that it will lead us to the next generation of tools and models.” 

LucidSim uses a combination of generative AI models to create the visual training data. Firstly, the researchers generated thousands of prompts for ChatGPT, getting it to create descriptions of a range of environments that represent the conditions the robot will encounter in the real world, including different types of weather, times of day, and lighting conditions. For example, these included ‘an ancient alley lined with tea houses and small, quaint shops, each displaying traditional ornaments and calligraphy’ and ‘the sun illuminates a somewhat unkempt lawn dotted with dry patches.’   

Related work from others:  Latest from MIT Tech Review - Why it’s impossible to build an unbiased AI language model

These descriptions were fed into a system which maps 3D geometry and physics data onto AI-generated images, creating short videos mapping the trajectory the robot will follow. The robot draws on this information to work out the height, width and depth of the things it has to navigate—a box or a set of stairs, for example.

The researchers tested LucidSim by instructing a four-legged robot equipped with a webcam to complete several tasks, including locating a traffic cone or soccer ball, climbing over a box and walking up and down stairs. The robot performed consistently better than when it ran a system trained on traditional simulations. Out of 20 trials to locate the cone, LucidSim had a 100% success rate, compared to 70% for systems trained on standard simulations. Similarly, LucidSim reached the soccer ball in another 20 trials 85% of the time, compared to just 35% for the other system. 

Finally, when the robot was running LucidSim, it successfully completed all 10 stair-climbing trials, compared to just 50% for the other system.

From left to right: Phillip Isola, Ge Yang and Alan Yu
COURTESY OF MIT CSAIL

These results are likely to improve even further in the future if LucidSim draws directly from sophisticated generative video models rather than a rigged-together combination of language, image and physics models, says Phillip Isola, an associate professor at MIT who worked on the research.

The researchers’ approach to using generative AI is a novel one that will pave the way for more interesting new research, says Mahi Shafiullah, a PhD student at New York University who is using AI models to train robots, and did not work on the project. 

Related work from others:  Latest from MIT Tech Review - Chinese tech giant Baidu releases its answer to ChatGPT

“The more interesting direction I see personally is a mix of both real and realistic “imagined” data that can help our current data hungry methods scale quicker and better,” he says.

The ability to train a robot from scratch purely on AI-generated situations and scenarios is a significant achievement—could extend beyond machines to more generalized AI agents, says Zafeirios Fountas, a senior research scientist at Huawei specializing in brain‑inspired AI.

“The term robots here is used very generally; we’re talking about some sort of AI that interacts with the real world,” he says. “I can imagine this being used to control any sort of visual information, from robots and self-driving cars up to controlling your computer screen or smartphone.”

In terms of next steps, the authors are interested in trying to train a humanoid robot using wholly synthetic data, which they acknowledge is an ambitious goal, as bipedal robots are typically less stable than their four-legged counterparts. They’re also turning their attention to another new challenge: using LucidSim to train the kinds of robotic arms that work in factories and kitchens, which requires a lot more dexterity and physical understanding than running around a landscape. 

“To actually pick up a cup of coffee and pour it is a very hard, open problem,” says Isola. “If we could take a simulation that’s been augmented with generative AI to create a lot of diversity and train a very robust agent that can operate in a cafe, I think that would be very cool.”

Similar Posts