Latest from MIT Tech Review – It’s easy to tamper with watermarks from AI-generated text

Watermarks for AI-generated text are easy to remove and can be stolen and copied, rendering them useless, researchers have found. They say these kinds of attacks discredit watermarks and can fool people into trusting text they shouldn’t.

Watermarking works by inserting hidden patterns in AI-generated text, which allow computers to detect that the text comes from an AI system. They’re a fairly new invention, but they have already become a popular solution for fighting AI-generated misinformation and plagiarism. For example, the European Union’s AI Act, which enters into force in May, will require developers to watermark AI-generated content. But the new research shows that the cutting edge of watermarking technology doesn’t live up to regulators’ requirements, says Robin Staab, a PhD student at ETH Zürich, who was part of the team that developed the attacks. The research is yet to be peer reviewed.

AI language models work by predicting the next likely word in a sentence, generating one word at a time on the basis of those predictions. Watermarking algorithms for text divide the language model’s vocabulary into words on a “green list” and a “red list,” and then make the AI model choose words from the green list. The more words in a sentence that are from the green list, the more likely it is that the text was generated by a computer. Humans tend to write sentences that include a more random mix of words.

The researchers tampered with five different watermarks that work in this way. They were able to reverse-engineer the watermarks by using an API to access the AI model with the watermark applied and prompting it many times, says Staab. The responses allow the attacker to “steal” the watermark by building an approximate model of the watermarking rules. They do this by analyzing the AI outputs and comparing them with normal text.

Once they have an approximate idea of what the watermarked words might be, this allows the researchers to execute two kinds of attacks. The first one, called a spoofing attack, allows malicious actors to use the information they learned from stealing the watermark to produce text that can be passed off as being watermarked. The second attack allows hackers to scrub AI-generated text from its watermark, so the text can be passed off as human-written.

The team had a roughly 80% success rate in spoofing watermarks, and an 85% success rate in stripping AI-generated text of its watermark.

Researchers not affiliated with the ETH Zürich team, such as Soheil Feizi, an associate professor and director of the Reliable AI Lab at the University of Maryland, have also found watermarks to be unreliable and vulnerable to spoofing attacks.

The findings from ETH Zürich confirm that these issues with watermarks persist and extend to the most advanced types of chatbots and large language models being used today, says Feizi.

The research “underscores the importance of exercising caution when deploying such detection mechanisms on a large scale,” he says.

Despite the findings, watermarks remain the most promising way to detect AI-generated content, says Nikola Jovanović, a PhD student at ETH Zürich who worked on the research.

But more research is needed to make watermarks ready for deployment on a large scale, he adds. Until then, we should manage our expectations of how reliable and useful these tools are. “If it’s better than nothing, it is still useful,” he says.

UC Berkeley – Why Generalization in RL is Difficult: Epistemic POMDPs and Implicit Partial Observability

Many experimental works have observed that generalization in deep RL appears to be difficult: although RL agents can learn to perform very complex tasks, they don’t seem to generalize over diverse task distributions as well as the excellent generalization of supervised deep nets might lead us to expect. In this blog post, we will aim…

Artificial Intelligence

Latest from MIT Tech Review – Google DeepMind has a new way to look inside an AI’s “mind”

AI has led to breakthroughs in drug discovery and robotics and is in the process of entirely revolutionizing how we interact with machines and the web. The only problem is we don’t know exactly how it works, or why it works so well. We have a fair idea, but the details are too complex to…

Artificial Intelligence

Latest from MIT Tech Review – What I learned from the UN’s “AI for Good” summit

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Greetings from Switzerland! I’ve just come back from Geneva, which last week hosted the UN’s AI for Good Summit, organized by the International Telecommunication Union. The summit’s big focus was how AI can…

Artificial Intelligence

O’Reilly Media – Protocols and Power

The AI Frontiers article (reproduced below) builds on a previous Asimov Addendum article written by Tim O’Reilly, entitled: “Disclosures. I do not think that word means what you think it means.” I (Ilan) think it’s important to first very briefly go through parts of Tim’s original piece to help recap why we—at the AI Disclosures Project—care about protocols…

Artificial Intelligence

Latest from MIT : 3 Questions: Inverting the problem of design

The process of computational design in mechanical engineering often begins with a problem or a goal, followed by an assessment of literature, resources, and systems available to address the issue. The Design Computation and Digital Engineering (DeCoDE) Lab at MIT instead explores the bounds of what is possible. Working with the MIT-IBM Watson AI Lab,…

Artificial Intelligence

Latest from MIT Tech Review – Job titles of the future: AI prompt engineer

The role of AI prompt engineer attracted attention for its high-six-figure salaries when it emerged in early 2023. Companies define it in different ways, but its principal aim is to help a company integrate AI into its operations. Danai Myrtzani of Sleed, a digital marketing agency in Greece, describes herself as more prompter than engineer….

Similar Posts