This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here.

OpenAI’s Olivier Godement, head of product for its platform, and Romain Huet, head of developer experience, are on a whistle-stop tour around the world. Last week, I sat down with the pair in London before DevDay, the company’s annual developer conference. London’s DevDay is the first one for the company outside San Francisco. Godement and Huet are heading to Singapore next. 

It’s been a busy few weeks for the company. In London, OpenAI announced updates to its new Realtime API platform, which allows developers to build voice features into their applications. The company is rolling out new voices and a function that lets developers generate prompts, which will allow them to build apps and more helpful voice assistants more quickly. Meanwhile for consumers, OpenAI announced it was launching ChatGPT search, which allows users to search the internet using the chatbot. Read more here

Both developments pave the way for the next big thing in AI: agents. These are AI assistants that can complete complex chains of tasks, such as booking flights. (You can read my explainer on agents here.) 

“Fast-forward a few years—every human on Earth, every business, has an agent. That agent knows you extremely well. It knows your preferences,” Godement says. The agent will have access to your emails, apps, and calendars and will act like a chief of staff, interacting with each of these tools and even working on long-term problems, such as writing a paper on a particular topic, he says. 

OpenAI’s strategy is to both build agents itself and allow developers to use its software to build their own agents, says Godement. Voice will play an important role in what agents will look and feel like. 

“At the moment most of the apps are chat based … which is cool, but not suitable for all use cases. There are some use cases where you’re not typing, not even looking at the screen, and so voice essentially has a much better modality for that,” he says. 

Related work from others:  Latest from MIT : AI model speeds up high-resolution computer vision

But there are two big hurdles that need to be overcome before agents can become a reality, Godement says. 

The first is reasoning. Building AI agents requires us to be able to trust that they will be able to complete complex tasks and do the right things, says Huet. That’s where OpenAI “reasoning” feature comes in. Introduced in OpenAI’s o1 model last month, it uses reinforcement learning to teach the model how to process information using “chain of thought.” Giving the model more time to generate answers allows it to recognize and correct mistakes, break down problems into smaller ones, and try different approaches to answering questions, Godement says. 

But OpenAI’s claims about reasoning should be taken with a pinch of salt, says Chirag Shah, a computer science professor at the University of Washington. Large language models are not exhibiting true reasoning. It’s most likely that they have picked up what looks like logic from something they’ve seen in their training data.

“These models sometimes seem to be really amazing at reasoning, but it’s just like they’re really good at pretending, and it only takes a little bit of picking at them to break them,” he says.

There is still much more work to be done, Godement admits. In the short term, AI models such as o1 need to be much more reliable, faster, and cheaper. In the long term, the company needs to apply its chain-of-thought technique to a wider pool of use cases. OpenAI has focused on science, coding, and math. Now it wants to address other fields, such as law, accounting, and economics, he says. 

Second on the to-do list is the ability to connect different tools, Godement says. An AI model’s capabilities will be limited if it has to rely on its training data alone. It needs to be able to surf the web and look for up-to-date information. ChatGPT search is one powerful way OpenAI’s new tools can now do that. 

Related work from others:  Latest from MIT : “They can see themselves shaping the world they live in”

These tools need to be able not only to retrieve information but to take actions in the real world. Competitor Anthropic announced a new feature where its Claude chatbot can “use” a computer by interacting with its interface to click on things, for example. This is an important feature for agents if they are going to be able to execute tasks like booking flights. Godement says o1 can “sort of” use tools, though not very reliably, and that research on tool use is a “promising development.” 

In the next year, Godemont says, he expects the adoption of AI for customer support and other assistant-based tasks to grow. However, he says that it can be hard to predict how people will adopt and use OpenAI’s technology. 

“Frankly, looking back every year, I’m surprised by use cases that popped up that I did not even anticipate,” he says. “I expect there will be quite a few surprises that you know none of us could predict.” 

Now read the rest of The Algorithm

Deeper Learning

This AI-generated version of Minecraft may represent the future of real-time video generation

When you walk around in a version of the video game Minecraft from the AI companies Decart and Etched, it feels a little off. Sure, you can move forward, cut down a tree, and lay down a dirt block, just like in the real thing. If you turn around, though, the dirt block you just placed may have morphed into a totally new environment. That doesn’t happen in Minecraft. But this new version is entirely AI-generated, so it’s prone to hallucinations. Not a single line of code was written.

Ready, set, go: This version of Minecraft is generated in real time, using a technique known as next-frame prediction. The AI companies behind it did this by training their model, Oasis, on millions of hours of Minecraft game play and recordings of the corresponding actions a user would take in the game. The AI is able to sort out the physics, environments, and controls of Minecraft from this data alone. Read more from Scott J. Mulligan.

Related work from others:  Latest from MIT Tech Review - Deepfakes of Chinese influencers are livestreaming 24/7

Bits and Bytes

AI search could break the web
At its best, AI search can better infer a user’s intent, amplify quality content, and synthesize information from diverse sources. But if AI search becomes our primary portal to the web, it threatens to disrupt an already precarious digital economy, argues Benjamin Brooks, a fellow at the Berkman Klein Center at Harvard University, who used to lead public policy for Stability AI. (MIT Technology Review

AI will add to the e-waste problem. Here’s what we can do about it.
Equipment used to train and run generative AI models could produce up to 5 million tons of e-waste by 2030, a relatively small but significant fraction of the global total. (MIT Technology Review

How an “interview” with a dead luminary exposed the pitfalls of AI
A state-funded radio station in Poland fired its on-air talent and brought in AI-generated presenters. But the experiment caused an outcry and was stopped when tone of them  “interviewed” a dead Nobel laureate. (The New York Times

Meta says yes, please, to more AI-generated slop
In Meta’s latest earnings call, CEO Mark Zuckerberg said we’re likely to see 
“a whole new category of content, which is AI generated or AI summarized content or kind of existing content pulled together by AI in some way.” Zuckerberg added that he thinks “that’s going to be just very exciting.” (404 Media

Similar Posts