Latest from MIT Tech Review – Meta’s new AI can make video based on text prompts

Meta has today unveiled an AI system that generates short videos based on text prompts.

Make-A-Video lets you type in a string of words, like “A dog wearing a superhero outfit with a red cape flying through the sky,” and then generates a five-second clip that, while pretty accurate, has the aesthetics of a trippy old home video.

Although the effect is rather crude, the system offers an early glimpse of what’s coming next for generative artificial intelligence, and it is the next obvious step from the text-to-image AI systems that have caused huge excitement this year.

Meta’s announcement of Make-A-Video, which is not yet being made available to the public, will likely prompt other AI labs to release their own versions. It also raises some big ethical questions.

In the last month alone, AI lab OpenAI has made its latest text-to-image AI system DALL-E available to everyone, and AI startup Stability.AI launched Stable Diffusion, an open-source text-to-image system.

But text-to-video AI comes with some even greater challenges. For one, these models need a vast amount of computing power. They are an even bigger computational lift than large text-to-image AI models, which use millions of images to train, because putting together just one short video requires hundreds of images. That means it’s really only large tech companies that can afford to build these systems for the foreseeable future. They’re also trickier to train, because there aren’t large-scale data sets of high-quality videos paired with text.

To work around this, Meta combined data from three open-source image and video data sets to train its model. Standard text-image data sets of labeled still images helped the AI learn what objects are called and what they look like. And a database of videos helped it learn how those objects are supposed to move in the world. The combination of the two approaches helped Make-A-Video, which is described in a non-peer-reviewed paper published today, generate videos from text at scale.

Tanmay Gupta, a computer vision research scientist at the Allen Institute for Artificial Intelligence, says Meta’s results are promising. The videos it’s shared show that the model can capture 3D shapes as the camera rotates. The model also has some notion of depth and understanding of lighting. Gupta says some details and movements are decently done and convincing.

“A young couple walking in heavy rain”

However, “there’s plenty of room for the research community to improve on, especially if these systems are to be used for video editing and professional content creation,” he adds. In particular, it’s still tough to model complex interactions between objects.

In the video generated by the prompt “An artist’s brush painting on a canvas,” the brush moves over the canvas, but strokes on the canvas aren’t realistic. “I would love to see these models succeed at generating a sequence of interactions, such as ‘The man picks up a book from the shelf, puts on his glasses, and sits down to read it while drinking a cup of coffee,’” Gupta says.

“An artist’s brush painting on a canvas”

For its part, Meta promises that the technology could “open new opportunities for creators and artists.” But as the technology develops, there are fears it could be harnessed as a powerful tool to create and disseminate misinformation and deepfakes. It might make it even more difficult to differentiate between real and fake content online.

Meta’s model ups the stakes for generative AI both technically and creatively but also “in terms of the unique harms that could be caused through generated video as opposed to still images,” says Henry Ajder, an expert on synthetic media.

“At least today, creating factually inaccurate content that people might believe in requires some effort,” Gupta says. “In the future, it may be possible to create misleading content with a few keystrokes.”

The researchers who built Make-A-Video filtered out offensive images and words, but with data sets that consist of millions and millions of words and images, it is almost impossible to fully remove biased and harmful content.

A spokesperson for Meta says it is not making the model available to the public yet, and that “as part of this research, we will continue to explore ways to further refine and mitigate potential risk.”

Latest from MIT Tech Review – AI models are using material from retracted scientific papers

Some AI chatbots rely on flawed research from retracted scientific papers to answer questions, according to recent studies. The findings, confirmed by MIT Technology Review, raise questions about how reliable AI tools are at evaluating scientific research and could complicate efforts by countries and industries seeking to invest in AI tools for scientists. AI search…

Artificial Intelligence

O’Reilly Media – The AI Blues

A recent article in Computerworld argued that the output from generative AI systems, like GPT and Gemini, isn’t as good as it used to be. It isn’t the first time I’ve heard this complaint, though I don’t know how widely held that opinion is. But I wonder: is it correct? And why? I think a…

Artificial Intelligence

Latest from MIT : Sixteen new START.nano companies are developing hard-tech solutions with the support of MIT.nano

MIT.nano has announced that 16 startups became active participants in its START.nano program in 2025, more than doubling the number of new companies from the previous year. Aimed at speeding the transition of hard-tech innovation to market, START.nano supports new ventures through the discounted use of MIT.nano shared facilities and a guided access to the…

Artificial Intelligence

Latest from MIT Tech Review – Enabling agent-first process redesign

Unlike static, rules-based systems, AI agents can learn, adapt, and optimize processes dynamically. As they interact with data, systems, people, and other agents in real time, AI agents can execute entire workflows autonomously. But unlocking their potential requires redesigning processes around agents rather than bolting them onto fragmented legacy workflows using traditional optimization methods. Companies…

Artificial Intelligence

Latest from Google AI – EHR-Safe: Generating High-Fidelity and Privacy-Preserving Synthetic Electronic Health Records

Posted by Jinsung Yoon and Sercan O. Arik, Research Scientists, Google Research, Cloud AI Team Analysis of Electronic Health Records (EHR) has a tremendous potential for enhancing patient care, quantitatively measuring performance of clinical practices, and facilitating clinical research. Statistical estimation and machine learning (ML) models trained on EHR data can be used to predict…

Artificial Intelligence

Latest from Google AI – Learning to Walk in the Wild from Terrain Semantics

Posted by Yuxiang Yang, Student Researcher, Robotics at Google An important promise for quadrupedal robots is their potential to operate in complex outdoor environments that are difficult or inaccessible for humans. Whether it’s to find natural resources deep in the mountains, or to search for life signals in heavily-damaged earthquake sites, a robust and versatile…

Similar Posts