Latest from MIT Tech Review – OpenAI’s new GPT-4o model lets people interact using voice or video in the same model

OpenAI just debuted GPT-4o, a new kind of AI model that you can communicate with in real time via live voice conversation, video streams from your phone, and text. The model is rolling out over the next few weeks and will be free for all users through both the GPT app and the web interface, according to the company. Users who subscribe to OpenAI’s paid tiers, which start at $20 per month, will be able to make more requests.

OpenAI CTO Mira Murati led the live demonstration of the new release one day before Google is expected to unveil its own AI advancements at its flagship I/O conference on Tuesday, May 14.

GPT-4 offered similar capabilities, giving users multiple ways to interact with OpenAI’s AI offerings. But it siloed them in separate models, leading to longer response times and presumably higher computing costs. GPT-4o has now merged those capabilities into a single model, which Murati called an “omnimodel.” That means faster responses and smoother transitions between tasks, she said.

The result, the company’s demonstration suggests, is a conversational assistant much in the vein of Siri or Alexa but capable of fielding much more complex prompts.

“We’re looking at the future of interaction between ourselves and the machines,” Murati said of the demo. “We think that GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural.”

Barret Zoph and Mark Chen, both researchers at OpenAI, walked through a number of applications for the new model. Most impressive was its facility with live conversation. You could interrupt the model during its responses, and it would stop, listen, and adjust course.

OpenAI showed off the ability to change the model’s tone, too. Chen asked the model to read a bedtime story “about robots and love,” quickly jumping in to demand a more dramatic voice. The model got progressively more theatrical until Murati demanded that it pivot quickly to a convincing robot voice (which it excelled at). While there were predictably some short pauses during the conversation while the model reasoned through what to say next, it stood out as a remarkably naturally paced AI conversation.

The model can reason through visual problems in real time as well. Using his phone, Zoph filmed himself writing an algebra equation (3x + 1 = 4) on a sheet of paper, having GPT-4o follow along. He instructed it not to provide answers, but instead to guide him much as a teacher would.

“The first step is to get all the terms with x on one side,” the model said in a friendly tone. “So, what do you think we should do with that plus one?”

GPT-4o will store records of users’ interactions with it, meaning the model “now has a sense of continuity across all your conversations,” according to Murati. Other highlights include live translation, the ability to search through your conversations with the model, and the power to look up information in real time.

As is the nature of a live demo, there were hiccups and glitches. GPT-4o’s voice might jump in awkwardly during the conversation. It appeared to comment on one of the presenters’ outfits even though it wasn’t asked to. But it recovered well when the demonstrators told the model it had erred. It seems to be able to respond quickly and helpfully across several mediums that other models have not yet merged as effectively.

Previously, many of OpenAI’s most powerful features, like reasoning through image and video, were behind a paywall. GPT-4o marks the first time they’ll be opened up to the wider public, though it’s not yet clear how many interactions you’ll be able to have with the model before being charged. OpenAI says paying subscribers will “continue to have up to five times the capacity limits of our free users.”

Additional reporting by Will Douglas Heaven.

Latest from MIT Tech Review – What babies can teach AI

This story originally appeared in The Algorithm, our weekly newsletter on AI. To get stories like this in your inbox first, sign up here. Human babies are fascinating creatures. Despite being completely dependent on their parents for a long time, they can do some amazing stuff. Babies have an innate understanding of the physics of…

Artificial Intelligence

Latest from MIT Tech Review – OpenAI ups its lobbying efforts nearly seven-fold

OpenAI spent $1.76 million on lobbying in 2024 and $510,000 in the last three months of the year alone, according to a new disclosure filed on Tuesday—a significant jump from 2023 when the company disclosed just $260,000 spent on Capitol Hill. The company also disclosed a new in-house lobbyist, Meghan Dorn, who worked for five…

Artificial Intelligence

Latest from MIT : Designing a new way to optimize complex coordinated systems

Coordinating complicated interactive systems, whether it’s the different modes of transportation in a city or the various components that must work together to make an effective and efficient robot, is an increasingly important subject for software designers to tackle. Now, researchers at MIT have developed an entirely new way of approaching these complex problems, using simple…

Artificial Intelligence

Latest from Google AI – Simulations illuminate the path to post-event traffic flow

Posted by Yechen Li and Neha Arora, Software Engineers, Google Research Fifteen minutes. That’s how long it took to empty the Colosseum, an engineering marvel that’s still standing as the largest amphitheater in the world. Two thousand years later, this design continues to work well to move enormous crowds out of sporting and entertainment venues….

Artificial Intelligence

Latest from Google AI – PaLI: Scaling Language-Image Learning in 100+ Languages

Posted by Xi Chen and Xiao Wang, Software Engineers, Google Research Advanced language models (e.g., GPT, GLaM, PaLM and T5) have demonstrated diverse capabilities and achieved impressive results across tasks and languages by scaling up their number of parameters. Vision-language (VL) models can benefit from similar scaling to address many tasks, such as image captioning,…

Artificial Intelligence

Latest from MIT : Learning how to predict rare kinds of failures

On Dec. 21, 2022, just as peak holiday season travel was getting underway, Southwest Airlines went through a cascading series of failures in their scheduling, initially triggered by severe winter weather in the Denver area. But the problems spread through their network, and over the course of the next 10 days the crisis ended up…

Similar Posts