To receive The Algorithm in your inbox every Monday, sign up here.

Welcome to the Algorithm! 

Is anyone else feeling dizzy? Just when the AI community was wrapping its head around the astounding progress of text-to-image systems, we’re already moving on to the next frontier: text-to-video. 

Late last week, Meta unveiled Make-A-Video, an AI that generates five-second videos from text prompts.

Built on open-source data sets, Make-A-Video lets you type in a string of words, like “A dog wearing a superhero outfit with a red cape flying through the sky,” and then generates a clip that, while pretty accurate, has the aesthetics of a trippy old home video. 

The development is a breakthrough in generative AI that also raises some tough ethical questions. Creating videos from text prompts is a lot more challenging and expensive than generating images, and it’s impressive that Meta has come up with a way to do it so quickly. But as the technology develops, there are fears it could be harnessed as a powerful tool to create and disseminate misinformation. You can read my story about it here

Just days since it was announced, though, Meta’s system is already starting to look kinda basic. It’s one of a number of text-to-video models submitted in papers to one of the leading AI conferences, the International Conference on Learning Representations. 

Another, called Phenaki, is even more advanced. 

It can generate video from a still image and a prompt rather than a text prompt alone. It can also make far longer clips: users can create videos multiple minutes long based on several different prompts that form the script for the video. (For example: “A photorealistic teddy bear is swimming in the ocean at San Francisco. The teddy bear goes underwater. The teddy bear keeps swimming under the water with colorful fishes. A panda bear is swimming underwater.”) 

Related work from others:  Latest from MIT Tech Review - China has a new plan for judging the safety of generative AI—and it’s packed with details

Video generated by Phenaki.

A technology like this could revolutionize filmmaking and animation. It’s frankly amazing how quickly this happened. DALL-E was launched just last year. It’s both extremely exciting and slightly horrifying to think where we’ll be this time next year. 

Researchers from Google also submitted a paper to the conference about their new model called DreamFusion, which generates 3D images based on text prompts. The 3D models can be viewed from any angle, the lighting can be changed, and the model can be plonked into any 3D environment. 

Don’t expect that you’ll get to play with these models anytime soon. Meta isn’t releasing Make-A-Video to the public yet. That’s a good thing. Meta’s model is trained using the same open-source image-data set that was behind Stable Diffusion. The company says it filtered out toxic language and NSFW images, but that’s no guarantee that they will have caught all the nuances of human unpleasantness when data sets consist of millions and millions of samples. And the company doesn’t exactly have a stellar track record when it comes to curbing the harm caused by the systems it builds, to put it lightly. 

The creators of Pheraki write in their paper that while the videos their model produces are not yet indistinguishable in quality from real ones, it “is within the realm of possibility, even today.” The models’ creators say that  before releasing their model, they want to get a better understanding of data, prompts, and filtering outputs and measure biases in order to mitigate harms. 

Related work from others:  Latest from MIT : AI pareidolia: Can machines spot faces in inanimate objects?

It’s only going to become harder and harder to know what’s real online, and video AI opens up a slew of unique dangers that audio and images don’t, such as the prospect of turbo-charged deepfakes. Platforms like TikTok and Instagram are already warping our sense of reality through augmented facial filters. AI-generated video could be a powerful tool for misinformation, because people have a greater tendency to believe and share fake videos than fake audio and text versions of the same content, according to researchers at Penn State University. 

In conclusion, we haven’t come even close to figuring out what to do about the toxic elements of language models. We’ve only just started examining the harms around text-to-image AI systems. Video? Good luck with that. 

Deeper Learning

The EU wants to put companies on the hook for harmful AI

The EU is creating new rules to make it easier to sue AI companies for harm. A new bill published last week, which is likely to become law in a couple of years, is part of a push from Europe to force AI developers not to release dangerous systems.

The bill, called the AI Liability Directive, will add teeth to the EU’s AI Act, which is set to become law around a similar time. The AI Act would require extra checks for “high risk” uses of AI that have the most potential to harm people. This could include AI systems used for policing, recruitment, or health care. 

The liability law would kick in once harm has already happened. It would give people and companies the right to sue for damages when they have been harmed by an AI system—for example, if they can prove that discriminatory AI has been used to disadvantage them as part of a hiring process.

Related work from others:  Latest from Google AI - Towards Helpful Robots: Grounding Language in Robotic Affordances

But there’s a catch: Consumers will have to prove that the company’s AI harmed them, which could be a huge undertaking. You can read my story about it here.

Bits and Bytes

How robots and AI are helping develop better batteries
Researchers at Carnegie Mellon used an automated system and machine-learning software to generate electrolytes that could enable lithium-ion batteries to charge faster, addressing one of the major obstacles to the widespread adoption of electric vehicles. (MIT Technology Review

Can smartphones help predict suicide?
Researchers at Harvard University are using data collected from smartphones and wearable biosensors, such as Fitbit watches, to create an algorithm that might help predict when patients are at risk of suicide and help clinicians intervene. (The New York Times)

OpenAI has made its text-to-image AI DALL-E available to all. 
AI-generated images are going to be everywhere. You can try the software here.

Someone has made an AI that creates Pokémon lookalikes of famous people.
The only image-generation AI that matters. (The Washington Post)

Thanks for reading! See you next week. 

Melissa

Similar Posts