Latest from MIT Tech Review – Meta’s new AI models can recognize and produce speech for more than 1,000 languages

Meta has built AI models that can recognize and produce speech for more than 1,000 languages—a tenfold increase on what’s currently available. It’s a significant step toward preserving languages that are at risk of disappearing, the company says.

Meta is releasing its models to the public via the code hosting service GitHub. It claims that making them open source will help developers working in different languages to build new speech applications—like messaging services that understand everyone, or virtual-reality systems that can be used in any language.

There are around 7,000 languages in the world, but existing speech recognition models cover only about 100 of them comprehensively. This is because these kinds of models tend to require huge amounts of labeled training data, which is available for only a small number of languages, including English, Spanish, and Chinese.

Meta researchers got around this problem by retraining an existing AI model developed by the company in 2020 that is able to learn speech patterns from audio without requiring large amounts of labeled data, such as transcripts.

They trained it on two new data sets: one that contains audio recordings of the New Testament Bible and its corresponding text taken from the internet in 1,107 languages, and another containing unlabeled New Testament audio recordings in 3,809 languages. The team processed the speech audio and the text data to improve its quality before running an algorithm designed to align audio recordings with accompanying text. They then repeated this process with a second algorithm trained on the newly aligned data. With this method, the researchers were able to teach the algorithm to learn a new language more easily, even without the accompanying text.

“We can use what that model learned to then quickly build speech systems with very, very little data,” says Michael Auli, a research scientist at Meta who worked on the project.

“For English, we have lots and lots of good data sets, and we have that for a few more languages, but we just don’t have that for languages that are spoken by, say, 1,000 people.”

The researchers say their models can converse in over 1,000 languages but recognize more than 4,000.

They compared the models with those from rival companies, including OpenAI Whisper, and claim theirs had half the error rate, despite covering 11 times more languages.

However, the team warns the model is still at risk of mistranscribing certain words or phrases, which could result in inaccurate or potentially offensive labels. They also acknowledge that their speech recognition models yielded more biased words than other models, albeit only 0.7% more.

While the scope of the research is impressive, the use of religious texts to train AI models can be controversial, says Chris Emezue, a researcher at Masakhane, an organization working on natural-language processing for African languages, who was not involved in the project.

“The Bible has a lot of bias and misrepresentations,” he says.

Artificial Intelligence

Latest from MIT : An easier way to teach robots new skills

With e-commerce orders pouring in, a warehouse robot picks mugs off a shelf and places them into boxes for shipping. Everything is humming along, until the warehouse processes a change and the robot must now grasp taller, narrower mugs that are stored upside down. Reprogramming that robot involves hand-labeling thousands of images that show it…

Artificial Intelligence

O’Reilly Media – Building AI-Resistant Technical Debt

Anyone who’s used AI to generate code has seen it make mistakes. But the real danger isn’t the occasional wrong answer; it’s in what happens when those errors pile up across a codebase. Issues that seem small at first can compound quickly, making code harder to understand, maintain, and evolve. To really see that danger,…

Artificial Intelligence

Latest from MIT Tech Review – A bot that watched 70,000 hours of Minecraft could unlock AI’s next big thing

OpenAI has built the best Minecraft-playing bot yet by making it watch 70,000 hours of video of people playing the popular computer game. It showcases a powerful new technique that could be used to train machines to carry out a wide range of tasks by binging on sites like YouTube, a vast and untapped source…

Artificial Intelligence

Latest from MIT Tech Review – Can an AI doppelgänger help me do my job?

Everywhere I look, I see AI clones. On X and LinkedIn, “thought leaders” and influencers offer their followers a chance to ask questions of their digital replicas. OnlyFans creators are having AI models of themselves chat, for a price, with followers. “Virtual human” salespeople in China are reportedly outselling real humans. Digital clones—AI models that…

Artificial Intelligence

Latest from MIT Tech Review – The Download: Google’s AI cuteness overload, and America’s fight for gun control

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. The dark secret behind those cute AI-generated animal images Another month, another flood of weird, wonderful and cute images generated by an artificial intelligence. In April, OpenAI showed off its new picture-making neural…

Artificial Intelligence

Latest from MIT Tech Review – We need to prepare for ‘addictive intelligence’

AI concerns overemphasize harms arising from subversion rather than seduction. Worries about AI often imagine doomsday scenarios where systems escape human control or even understanding. Short of those nightmares, there are nearer-term harms we should take seriously: that AI could jeopardize public discourse through misinformation; cement biases in loan decisions, judging or hiring; or disrupt…

Similar Posts