OpenAI is rolling out an advanced AI chatbot that you can talk to. It’s available today—at least for some. 

The new chatbot represents OpenAI’s push into a new generation of AI-powered voice assistants in the vein of Siri and Alexa, but with far more capabilities to enable more natural, fluent conversations. It is a step in the march to more fully capable AI agents. The new ChatGPT voice bot can tell what different tones of voice convey, responds to interruptions, and reply to queries in real time. It has also been trained to sound more natural and use voices to convey a wide range of different emotions.

The voice mode is powered by OpenAI’s new GPT-4o model, which combines voice, text, and vision capabilities. To gather feedback, the company is initially launching the chatbot to a “small group of users” paying for ChatGPT Plus, but it says it will make the bot available to all ChatGPT Plus subscribers this fall. A ChatGPT Plus subscription costs $20 a month. OpenAI says it will notify customers who are part of the first rollout wave in the ChatGPT app and provide instructions on how to use the new model.   

The new voice feature, which was announced in May, is being launched a month later than originally planned because the company said it needed more time to improve safety features, such as the model’s ability to detect and refuse unwanted content. The company also said it was preparing its infrastructure to offer real-time responses to millions of users. 

OpenAI says it has tested the model’s voice capabilities with more than 100 external red-teamers, who were tasked with probing the model for flaws. These testers spoke a total of 45 languages and represented 29 countries, according to OpenAI.

Related work from others:  UC Berkeley - Accelerating Ukraine Intelligence Analysis with Computer Vision on Synthetic Aperture Radar Imagery

The company says it has put several safety mechanisms in place. In a move that aims to prevent the model from being used to create audio deepfakes, for example, it has created four preset voices in collaboration with voice actors. GPT-4o will not impersonate or generate other people’s voices.  

When OpenAI first introduced GPT-4o, the company faced a backlash over its use of a voice called “Sky,” which sounded a lot like the actress Scarlett Johansson. Johansson released a statement saying the company had reached out to her for permission to use her voice for the model, which she declined. She said she was shocked to hear a voice “eerily similar” to hers in the model’s demo. OpenAI has denied that the voice is Johansson’s but has paused the use of Sky. 

The company is also embroiled in several lawsuits over alleged copyright infringement. OpenAI says it has adopted filters that recognize and block requests to generate music or other copyrighted audio. OpenAI also says it has applied the same safety mechanisms it uses in its text-based model to GPT-4o to prevent it from breaking laws and generating harmful content. 

Down the line, OpenAI plans to include more advanced features, such as video and screen sharing, which could make the assistant more useful. In its May demo, employees pointed their phone cameras at a piece of paper and asked the AI model to help them solve math equations. They also shared their computer screens and asked the model to help them solve coding problems. OpenAI says these features will not be available now but at an unspecified later date. 

Related work from others:  Latest from MIT : When to trust an AI model
Share via
Copy link
Powered by Social Snap