Latest from MIT Tech Review – OpenAI released its advanced voice mode to more people. Here’s how to get it.

OpenAI is broadening access to Advanced Voice Mode, a feature of ChatGPT that allows you to speak more naturally with the AI model. It allows you to interrupt its responses mid-sentence, and can also sense and interpret your emotions based on your tone of voice and adjust its responses accordingly.

These features were teased back in May when OpenAI unveiled GPT-4o but they were not released until July—and then just to an invite-only group. (At least initially, there seem to have been some safety issues with the model; OpenAI gave several WIRED reporters access to the voice mode back in May, but the magazine reported the company “pulled it the next morning, citing safety concerns.”) Users who’ve been able to try it have largely described the model as an impressively fast, dynamic, and realistic voice assistant—which has made its limited access particularly frustrating to some other OpenAI users.

Today is the first time OpenAI has promised to bring the new voice mode to a wide portion of users—here’s what you need to know.

What can it do?

Though ChatGPT currently offers a standard voice mode to paid users, its interactions can be clunky. In the mobile app, for example, you can’t interrupt the model’s often long-winded responses with your voice, only with a tap on the screen. The new version fixes that, and also promises to modify its responses based on the emotion it’s sensing from your voice. Like other versions of ChatGPT, users can also personalize the voice mode by asking the model to remember facts about themselves. The new mode also has improved its pronunciation of words in non-English languages.

AI investor Allie Miller posted a demo of the tool in August, which highlighted a lot of the same strengths of OpenAI’s own release videos: the model is fast and adept at changing its accent, tone, and content to match your needs.

I’m testing the new @OpenAI Advanced Voice Mode and I just snorted with laughter.

In a good way.

Watch the whole thing pic.twitter.com/vSOMzXdwZo

— Allie K. Miller (@alliekmiller) August 2, 2024

The update also adds new voices. Shortly after the launch of GPT-4o, OpenAI was criticized for the similarity between the female voice in its demo videos, named Sky, and that of Scarlett Johansson, who played an AI love interest in the movie Her. OpenAI then removed the voice. Now, it has launched five new voices, named Arbor, Maple, Sol, Spruce, and Vale, which will be available in both the standard and advanced voice modes. MIT Technology Review has not heard them yet, but OpenAI says they were made using professional voice actors from around the world. “We interviewed dozens of actors to find those with the qualities of voices we feel people will enjoy talking to for hours—warm, approachable, inquisitive, with some rich texture and tone,” a company spokesperson says.

Who can access it and when?

For now, OpenAI is rolling out access to Advanced Voice Mode to Plus users, who pay $20 per month for a premium version, and Team users, who pay $30 per month and have higher message limits. The next group to receive access will be those in Enterprise and Edu tiers. The exact timing, though, is vague; an OpenAI spokesperson says the company will “gradually roll out access to all Plus and Team users and will roll out to Enterprise and Edu tiers starting next week.” The company hasn’t committed to a firm deadline of when all users in these categories will have access. A message in the ChatGPT app indicates that all Plus users will have access by “the end of fall.”

There are geographic limitations. The new feature is not yet available in the EU, the UK, Switzerland, Iceland, Norway, and Liechtenstein.

There is no immediate plan to release Advanced Voice Mode to free users. (The standard mode remains available to all paid users.)

What steps have been taken to make sure it’s safe?

As the company noted upon the initial release in July and again emphasized this week, Advanced Voice Mode has been safety-tested by external experts “who collectively speak a total of 45 different languages, and represent 29 different geographies.” The GPT-4o system card details how the underlying model handles issues like generating violent or erotic speech, imitating voices without their consent, or generating copyrighted content.

Still, OpenAI’s models are not open-source. Compared to such models, which are more transparent about their training data and the “model weights” that govern how the AI produces responses, OpenAI’s closed-source models are harder for independent researchers to evaluate from the perspective of safety, bias, and harm.

Latest from MIT : Can deep learning transform heart failure prevention?

The ancient Greek philosopher and polymath Aristotle once concluded that the human heart is tri-chambered and that it was the single most important organ in the entire body, governing motion, sensation, and thought. Today, we know that the human heart actually has four chambers and that the brain largely controls motion, sensation, and thought. But…

Artificial Intelligence

Latest from MIT Tech Review – When AIs bargain, a less advanced agent could cost you

The race to build ever larger AI models is slowing down. The industry’s focus is shifting toward agents—systems that can act autonomously, make decisions, and negotiate on users’ behalf. These AI agents are already being deployed in customer service and programming—and, increasingly, in e-commerce and personal finance. But what would happen if both a customer…

Artificial Intelligence

Latest from MIT Tech Review – DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.

DeepMind cofounder Mustafa Suleyman wants to build a chatbot that does a whole lot more than chat. In a recent conversation I had with him, he told me that generative AI is just a phase. What’s next is interactive AI: bots that can carry out tasks you set for them by calling on other software…

Artificial Intelligence

Latest from MIT Tech Review – We’re getting a better idea of AI’s true carbon footprint

Large language models (LLMs) have a dirty secret: they require vast amounts of energy to train and run. What’s more, it’s still a bit of a mystery exactly how big these models’ carbon footprints really are. AI startup Hugging Face believes it’s come up with a new, better way to calculate that more precisely, by…

Artificial Intelligence

Latest from MIT Tech Review – Unlocking supply chain resiliency

Tracking a Big Mac hamburger’s journey from ranch to fast-food restaurant isn’t easy. Today’s highly segmented beef supply chain consists of a wide array of ranches, feedlots, packers, processors, distribution centers, and restaurants, each with its own set of carefully collected data. Yet in today’s complex digital world, organizations need more visibility than ever to…

Artificial Intelligence

Latest from Google AI – Batch calibration: Rethinking calibration for in-context learning and prompt engineering

Posted by Han Zhou, Student Researcher, and Subhrajit Roy, Senior Research Scientist, Google Research Prompting large language models (LLMs) has become an efficient learning paradigm for adapting LLMs to a new task by conditioning on human-designed instructions. The remarkable in-context learning (ICL) ability of LLMs also leads to efficient few-shot learners that can generalize from…

What can it do?

Who can access it and when?

What steps have been taken to make sure it’s safe?

Similar Posts