Latest from MIT Tech Review – A tiny new open-source AI model performs as well as powerful big ones

The Allen Institute for Artificial Intelligence (Ai2), a research nonprofit, is releasing a family of open-source multimodal language models, called Molmo, that it says perform as well as top proprietary models from OpenAI, Google, and Anthropic.

The organization claims that its biggest Molmo model, which has 72 billion parameters, outperforms OpenAI’s GPT-4o, which is estimated to have over a trillion parameters, in tests that measure things like understanding images, charts, and documents.

Meanwhile, Ai2 says a smaller Molmo model, with 7 billion parameters, comes close to OpenAI’s state-of-the-art model in performance, an achievement it ascribes to vastly more efficient data collection and training methods.

What Molmo shows is that open-source AI development is now on par with closed, proprietary models, says Ali Farhadi, the CEO of Ai2. And open-source models have a significant advantage, as their open nature means other people can build applications on top of them. The Molmo demo is available here, and it will be available for developers to tinker with on the Hugging Face website. (Certain elements of the most powerful Molmo model are still shielded from view.)

Other large multimodal language models are trained on vast data sets containing billions of images and text samples that have been hoovered from the internet, and they can include several trillion parameters. This process introduces a lot of noise to the training data and, with it, hallucinations, says Ani Kembhavi, a senior director of research at Ai2. In contrast, Ai2’s Molmo models have been trained on a significantly smaller and more curated data set containing only 600,000 images, and they have between 1 billion and 72 billion parameters. This focus on high-quality data, versus indiscriminately scraped data, has led to good performance with far fewer resources, Kembhavi says.

Ai2 achieved this by getting human annotators to describe the images in the model’s training data set in excruciating detail over multiple pages of text. They asked the annotators to talk about what they saw instead of typing it. Then they used AI techniques to convert their speech into data, which made the training process much quicker while reducing the computing power required.

These techniques could prove really useful if we want to meaningfully govern the data that we use for AI development, says Yacine Jernite, who is the machine learning and society lead at Hugging Face, and was not involved in the research.

“It makes sense that in general, training on higher-quality data can lower the compute costs,” says Percy Liang, the director of the Stanford Center for Research on Foundation Models, who also did not participate in the research.

Another impressive capability is that the model can “point” at things, meaning it can analyze elements of an image by identifying the pixels that answer queries.

In a demo shared with MIT Technology Review, Ai2 researchers took a photo outside their office of the local Seattle marina and asked the model to identify various elements of the image, such as deck chairs. The model successfully described what the image contained, counted the deck chairs, and accurately pinpointed to other things in the image as the researchers asked. It was not perfect, however. It could not locate a specific parking lot, for example.

Other advanced AI models are good at describing scenes and images, says Farhadi. But that’s not enough when you want to build more sophisticated web agents that can interact with the world and can, for example, book a flight. Pointing allows people to interact with user interfaces, he says.

Jernite says Ai2 is operating with a greater degree of openness than we’ve seen from other AI companies. And while Molmo is a good start, he says, its real significance will lie in the applications developers build on top of it, and the ways people improve it.

Farhadi agrees. AI companies have drawn massive, multitrillion-dollar investments over the past few years. But in the past few months, investors have expressed skepticism about whether that investment will bring returns. Big, expensive proprietary models won’t do that, he argues, but open-source ones can. He says the work shows that open-source AI can also be built in a way that makes efficient use of money and time.

“We’re excited about enabling others and seeing what others would build with this,” Farhadi says.

Latest from MIT Tech Review – What’s next for robots

MIT Technology Review’s What’s Next series looks across industries, trends, and technologies to give you a first look at the future. You can read the rest of them here. Jan Liphardt teaches bioengineering at Stanford, but to many strangers in Los Altos, California, he is a peculiar man they see walking a four-legged robotic dog down…

Artificial Intelligence

Latest from MIT Tech Review – The AI Hype Index: Robot pets, simulated humans, and Apple’s AI text summaries

Separating AI reality from hyped-up fiction isn’t always easy. That’s why we’ve created the AI Hype Index—a simple, at-a-glance summary of everything you need to know about the state of the industry. More than 70 countries went to the polls in 2024. The good news is that this year of global elections turned out to…

Artificial Intelligence

Artificial intelligence sheds light on how the brain processes language

In the past few years, artificial intelligence models of language have become very good at certain tasks. Most notably, they excel at predicting the next word in a string of text; this technology helps search engines and texting apps predict the next word you are going to type. The most recent generation of predictive language…

Artificial Intelligence

Latest from MIT Tech Review – Artists can now opt out of the next version of Stable Diffusion

Artists will have the chance to opt out of the next version of one of the world’s most popular text-to-image AI generators, Stable Diffusion, the company behind it has announced. Stability.AI will work with Spawning, an organization founded by artist couple Mat Dryhurst and Holly Hendon who have built a website called HaveIBeenTrained, that allows…

Artificial Intelligence

Latest from MIT Tech Review – Unlocking the mysteries of complex biological systems with agentic AI

The complexity of biology has long been a double-edged sword for scientific and medical progress. On one hand, the intricacy of systems (like the human immune response) offers countless opportunities for breakthroughs in medicine and healthcare. On the other hand, that very complexity has often stymied researchers, leaving some of the most significant medical challenges—like…

Artificial Intelligence

Latest from MIT Tech Review – The viral AI avatar app Lensa undressed me—without my consent

When I tried the new viral AI avatar app Lensa, I was hoping to get results similar to some of my colleagues at MIT Technology Review. The digital retouching app was first launched in 2018 but has recently become wildly popular thanks to the addition of Magic Avatars, an AI-powered feature which generates digital portraits…

Similar Posts