Latest from MIT Tech Review – A way to let robots learn by listening will make them more useful

Most AI-powered robots today use cameras to understand their surroundings and learn new tasks, but it’s becoming easier to train robots with sound too, helping them adapt to tasks and environments where visibility is limited.

Though sight is important, for some of our daily tasks, sound is actually more helpful, like listening to onions sizzling on the stove to see if the pan is at the right temperature. Training robots with audio has only been done in highly controlled lab settings, however, and the techniques have lagged behind other fast robot-teaching methods.

Researchers at the Robotics and Embodied AI Lab at Stanford University set out to change that. They first built a system for collecting audio data, consisting of a gripper with a microphone designed to filter out background noise, and a GoPro camera. Human demonstrators used the gripper for a variety of household tasks, then used this data to train robotic arms how to execute the task on their own. The team’s new training algorithms help robots gather clues from audio signals to perform more effectively.

“Thus far, robots have been training on videos that are muted,” says Zeyi Liu, a PhD student at Stanford and lead author of the study. “But there is so much helpful data in audio.”

To test how much more successful a robot can be if it’s capable of “listening”, the researchers chose four tasks: flipping a bagel in a pan, erasing a whiteboard, putting two velcro strips together, and pouring dice out of a cup. In each task, sounds provide clues that cameras or tactile sensors struggle with, like knowing if the eraser is properly contacting the whiteboard, or if the cup contains dice or not.

After demonstrating each task a couple hundred times, the team compared the success rates of training with audio versus only training with vision. The results, published in a paper on arXiv which has not been peer-reviewed, were promising. When using vision alone in the dice test, the robot could only tell 27% of the time if there were dice in the cup, but that rose to 94% when sound was included.

It isn’t the first time audio has been used to train robots, Liu says, but it’s a big step toward doing so at scale. “We are making it easier to use audio collected ‘in the wild,’ rather than being restricted to collecting it in the lab, which is more time-consuming.”

The research signals that audio might become a more sought-after data source in the race to train robots with AI. Researchers are teaching robots quicker than ever before using imitation learning, showing them hundreds of examples of tasks being done instead of hand-coding each task. If audio could be collected at scale using devices like the one in the study, it could provide an entirely new “sense” to robots, helping them more quickly adapt to environments where visibility is limited or not useful.

“It’s safe to say that audio is the most understudied modality for sensing” in robots, says Dmitry Berenson, associate professor of robotics at the University of Michigan, who was not involved in the study. That’s because the bulk of robotics research on manipulating objects has been for industrial pick-and-place tasks, like sorting objects into bins. Those tasks don’t benefit much from sound, instead relying on tactile or visual sensors. But, as robots broaden into tasks in homes, kitchens, and other environments, audio will become increasingly useful, Berenson says.

Consider a robot trying to find which bag contains a set of keys, all with limited visibility. “Maybe even before you touch the keys, you hear them kind of jangling,” Berenson says. “That’s a cue that the keys are in that pocket, instead of others.”

Still, audio has limits. The team points out sound won’t be as useful with so-called soft or flexible objects like clothes, which don’t create as much usable audio. The robots also struggled with filtering out the audio of their own motor noises during tasks, since that noise was not present in the training data produced by humans. To fix it, the researchers needed to add robot sounds–whirs, hums and actuator noises–into the training sets so the robots could learn to tune them out.

The next step, Liu says, is to see how much better the models can get with more data, which could mean more microphones, collecting spatial audio, and adding microphones to other types of data-collection devices.

Latest from MIT Tech Review – Exclusive: Ilya Sutskever, OpenAI’s chief scientist, on his hopes and fears for the future of AI

Ilya Sutskever, head bowed, is deep in thought. His arms are spread wide and his fingers are splayed on the tabletop like a concert pianist about to play his first notes. We sit in silence. I’ve come to meet Sutskever, OpenAI’s cofounder and chief scientist, in his company’s unmarked office building on an unremarkable street in…

Artificial Intelligence

Latest from MIT Tech Review – AI reasoning models can cheat to win chess games

Facing defeat in chess, the latest generation of AI reasoning models sometimes cheat without being instructed to do so. The finding suggests that the next wave of AI models could be more likely to seek out deceptive ways of doing whatever they’ve been asked to do. And worst of all? There’s no simple way to…

Artificial Intelligence

Latest from Google AI – Using AI to expand global access to reliable flood forecasts

Posted by Yossi Matias, VP Engineering & Research, and Grey Nearing, Research Scientist, Google Research Floods are the most common natural disaster, and are responsible for roughly $50 billion in annual financial damages worldwide. The rate of flood-related disasters has more than doubled since the year 2000 partly due to climate change. Nearly 1.5 billion…

Artificial Intelligence

UC Berkeley – TinyAgent: Function Calling at the Edge

The ability of LLMs to execute commands through plain language (e.g. English) has enabled agentic systems that can complete a user query by orchestrating the right set of tools (e.g. ToolFormer, Gorilla). This, along with the recent multi-modal efforts such as the GPT-4o or Gemini-1.5 model, has expanded the realm of possibilities with AI agents….

Artificial Intelligence

Latest from MIT : Making sense of all things data

Data, and more specifically using data, is not a new concept, but it remains an elusive one. It comes with terms like “the internet of things” (IoT) and “the cloud,” and no matter how often those are explained, smart people can still be confused. And then there’s the amount of information available and the speed…

Artificial Intelligence

Latest from MIT : Using reflections to see the world from new points of view

As a car travels along a narrow city street, reflections off the glossy paint or side mirrors of parked vehicles can help the driver glimpse things that would otherwise be hidden from view, like a child playing on the sidewalk behind the parked cars. Drawing on this idea, researchers from MIT and Rice University have…

Similar Posts