Latest from MIT Tech Review – AI models spit out photos of real people and copyrighted images

Popular image generation models can be prompted to produce identifiable photos of real people, potentially threatening their privacy, according to new research. The work also shows that these AI systems can be made to regurgitate exact copies of medical images and copyrighted work by artists. It’s a finding that could strengthen the case for artists who are currently suing AI companies for copyright violations.

The researchers, from Google, DeepMind, UC Berkeley, ETH Zürich, and Princeton, got their results by prompting Stable Diffusion and Google’s Imagen with captions for images, such as a person’s name, many times. Then they analyzed whether any of the images they generated matched original images in the model’s database. The group managed to extract over 100 replicas of images in the AI’s training set.

These image-generating AI models are trained on vast data sets consisting of images with text descriptions that have been scraped from the internet. The latest generation of the technology works by taking images in the data set and changing one pixel at a time until the original image is nothing but a collection of random pixels. The AI model then reverses the process to make the pixelated mess into a new image.

The paper is the first time researchers have managed to prove that these AI models memorize images in their training sets, says Ryan Webster, a PhD student at the University of Caen Normandy in France, who has studied privacy in other image generation models but was not involved in the research. This could have implications for startups wanting to use generative AI models in health care, because it shows that these systems risk leaking sensitive private information. OpenAI, Google, and Stability.AI did not respond to our requests for comment.

Eric Wallace, a PhD student at UC Berkeley who was part of the study group, says they hope to raise the alarm over the potential privacy issues around these AI models before they are rolled out widely in sensitive sectors like medicine.

“A lot of people are tempted to try to apply these types of generative approaches to sensitive data, and our work is definitely a cautionary tale that that’s probably a bad idea, unless there’s some kind of extreme safeguards taken to prevent [privacy infringements],” Wallace says.

The extent to which these AI models memorize and regurgitate images from their databases is also at the root of a huge feud between AI companies and artists. Stability.AI is facing two lawsuits from a group of artists and Getty Images, who argue that the company unlawfully scraped and processed their copyrighted material.

The researchers’ findings could strengthen the hand of artists accusing AI companies of copyright violations. If artists whose work was used to train Stable Diffusion can prove that the model has copied their work without permission, the company might have to compensate them.

The findings are timely and important, says Sameer Singh, an associate professor of computer science at the University of California, Irvine, who was not involved in the research. “It is important for general public awareness and to initiate discussions around security and privacy of these large models,” he adds.

The paper demonstrates that it’s possible to work out whether AI models have copied images and measure to what degree this has happened, which are both very valuable in the long term, Singh says.

Stable Diffusion is open source, meaning anyone can analyze and investigate it. Imagen is closed, but Google granted the researchers access. Singh says the work is a great example of how important it is to give research access to these models for analysis, and he argues that companies should be similarly transparent with other AI models, such as OpenAI’s ChatGPT.

However, while the results are impressive, they come with some caveats. The images the researchers managed to extract appeared multiple times in the training data or were highly unusual relative to other images in the data set, says Florian Tramèr, an assistant professor of computer science at ETH Zürich, who was part of the group.

People who look unusual or have unusual names are at higher risk of being memorized, says Tramèr.

The researchers were only able to extract relatively few exact copies of individuals’ photos from the AI model: just one in a million images were copies, according to Webster.

But that’s still worrying, Tramèr says: “I really hope that no one’s going to look at these results and say ‘Oh, actually, these numbers aren’t that bad if it’s just one in a million.’”

“The fact that they’re bigger than zero is what matters,” he adds.

Latest from Google AI – HEAL: A framework for health equity assessment of machine learning performance

Posted by Mike Schaekermann, Research Scientist, Google Research, and Ivor Horn, Chief Health Equity Officer & Director, Google Core Health equity is a major societal concern worldwide with disparities having many causes. These sources include limitations in access to healthcare, differences in clinical treatment, and even fundamental differences in the diagnostic technology. In dermatology for…

Artificial Intelligence

Latest from MIT : Vana is letting users own a piece of the AI models trained on their data

In February 2024, Reddit struck a $60 million deal with Google to let the search giant use data on the platform to train its artificial intelligence models. Notably absent from the discussions were Reddit users, whose data were being sold. The deal reflected the reality of the modern internet: Big tech companies own virtually all…

Artificial Intelligence

Latest from MIT Tech Review – People are using Google study software to make AI podcasts—and they’re weird and amazing

“All right, so today we are going to dive deep into some cutting-edge tech,” a chatty American male voice says. But this voice does not belong to a human. It belongs to Google’s new AI podcasting tool, called Audio Overview, which has become a surprise viral hit. The podcasting feature was launched in mid-September as…

Artificial Intelligence

Latest from Google AI – Scaling vision transformers to 22 billion parameters

Posted by Piotr Padlewski and Josip Djolonga, Software Engineers, Google Research Large Language Models (LLMs) like PaLM or GPT-3 showed that scaling transformers to hundreds of billions of parameters improves performance and unlocks emergent abilities. The biggest dense models for image understanding, however, have reached only 4 billion parameters, despite research indicating that promising multimodal…

Artificial Intelligence

UC Berkeley – Designing Societally Beneficial Reinforcement Learning Systems

Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind’s work on controlling a nuclear reactor or on improving Youtube video compression, or Tesla attempting to use a method inspired by MuZero for autonomous vehicle behavior planning. But the exciting potential…

Artificial Intelligence

Latest from Google AI – Google Research: Themes from 2021 and Beyond

Posted by Jeff Dean, Senior Fellow and SVP of Google Research, on behalf of the entire Google Research community Over the last several decades, I’ve witnessed a lot of change in the fields of machine learning (ML) and computer science. Early approaches, which often fell short, eventually gave rise to modern approaches that have been…

Similar Posts