Latest from MIT Tech Review – AI just beat a human test for creativity. What does that even mean?

AI is getting better at passing tests designed to measure human creativity. In a study published in Nature Scientific Reports today, AI chatbots achieved higher average scores than humans in the Alternate Uses Task, a test commonly used to assess this ability.

This study will add fuel to an ongoing debate among AI researchers about what it even means for a computer to pass tests devised for humans. The findings do not necessarily indicate that AIs are developing an ability to do something uniquely human. It could just be that AIs can pass creativity tests, not that they’re actually creative in the way we understand. However, research like this might give us a better understanding of how humans and machines approach creative tasks.

Researchers started by asking three AI chatbots—OpenAI’s ChatGPT and GPT-4 as well as Copy.Ai, which is built on GPT-3—to come up with as many uses for a rope, a box, a pencil, and a candle as possible within just 30 seconds.

Their prompts instructed the large language models to come up with original and creative uses for each of the items, explaining that the quality of the ideas was more important than the quantity. Each chatbot was tested 11 times for each of the four objects. The researchers also gave 256 human participants the same instructions.

The researchers used two methods to assess both AI and human responses. The first was an algorithm that rated how closely the suggested use for the object was to the object’s original purpose. The second involved asking six human assessors (who were unaware that some of the answers had been generated by AI systems) to evaluate each response on a scale of 1 to 5 in terms of how creative and original it was—1 being not at all, and 5 being very. Average scores for both humans and AIs were then calculated.

Although the chatbots’ responses were rated as better than the humans’ on average, the best-scoring human responses were higher.

While the purpose of the study was not to prove that AI systems are capable of replacing humans in creative roles, it raises philosophical questions about the characteristics that are unique to humans, says Simone Grassini, an associate professor of psychology at the University of Bergen, Norway, who co-led the research.

“We’ve shown that in the past few years, technology has taken a very big leap forward when we talk about imitating human behavior,” he says. “These models are continuously evolving.”

Proving that machines can perform well in tasks designed for measuring creativity in humans doesn’t demonstrate that they’re capable of anything approaching original thought, says Ryan Burnell, a senior research associate at the Alan Turing Institute, who was not involved with the research.

The chatbots that were tested are “black boxes,” meaning that we don’t know exactly what data they were trained on, or how they generate their responses, he says. “What’s very plausibly happening here is that a model wasn’t coming up with new creative ideas—it was just drawing on things it’s seen in its training data, which could include this exact Alternate Uses Task,” he explains. “In that case, we’re not measuring creativity. We’re measuring the model’s past knowledge of this kind of task.”

That doesn’t mean that it’s not still useful to compare how machines and humans approach certain problems, says Anna Ivanova, an MIT postdoctoral researcher studying language models, who did not work on the project.

However, we should bear in mind that although chatbots are very good at completing specific requests, slight tweaks like rephrasing a prompt can be enough to stop them from performing as well, she says. Ivanova believes that these kinds of studies should prompt us to examine the link between the task we’re asking AI models to complete and the cognitive capacity we’re trying to measure. “We shouldn’t assume that people and models solve problems in the same way,” she says.

Latest from MIT Tech Review – China figured out how to sell EVs. Now it has to bury their batteries.

In August 2025, Wang Lei decided it was finally time to say goodbye to his electric vehicle. Wang, who is 39, had bought the car in 2016, when EVs still felt experimental in Beijing. It was a compact Chinese brand. The subsidies were good, and the salesman talked about “supporting domestic innovation.” At the time,…

Artificial Intelligence

Latest from MIT Tech Review – A new AI translation system for headphones clones multiple voices simultaneously

Imagine going for dinner with a group of friends who switch in and out of different languages you don’t speak, but still being able to understand what they’re saying. This scenario is the inspiration for a new AI headphone system that translates the speech of multiple speakers simultaneously, in real time. The system, called Spatial…

Artificial Intelligence

Latest from Google AI – GraphWorld: Advances in Graph Benchmarking

John Palowitch and Anton Tsitsulin, Research Scientists, Google Research, Graph Mining team Graphs are very common representations of natural systems that have connected relational components, such as social networks, traffic infrastructure, molecules, and the internet. Graph neural networks (GNNs) are powerful machine learning (ML) models for graphs that leverage their inherent connections to incorporate context…

Artificial Intelligence

Latest from Google AI – FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

Posted by Parker Riley, Software Engineer, and Jan Botha, Research Scientist, Google Research Many languages spoken worldwide cover numerous regional varieties (sometimes called dialects), such as Brazilian and European Portuguese or Mainland and Taiwan Mandarin Chinese. Although such varieties are often mutually intelligible to their speakers, there are still important differences. For example, the Brazilian…

Artificial Intelligence

Latest from MIT : In bias we trust?

When the stakes are high, machine-learning models are sometimes used to aid human decision-makers. For instance, a model could predict which law school applicants are most likely to pass the bar exam to help an admissions officer determine which students should be accepted. These models often have millions of parameters, so how they make predictions…

Artificial Intelligence

Latest from MIT Tech Review – Google’s new tool lets large language models fact-check their responses

As long as chatbots have been around, they have made things up. Such “hallucinations” are an inherent part of how AI models work. However, they’re a big problem for companies betting big on AI, like Google, because they make the responses it generates unreliable. Google is releasing a tool today to address the issue. Called…

Similar Posts