OpenAI has begun rolling out one of the most anticipated and revolutionary features for ChatGPT: the new voice mode. This innovative feature mirrors the natural machine interaction depicted in the 2014 film "Her", making real-time conversation with artificial intelligence a reality.
In May of this year, OpenAI surprised the tech community by announcing a voice mode for ChatGPT, inspired by the aforementioned movie. The company, led by Sam Altman, promised that the feature would be available in “a few weeks.” However, the launch was delayed by a month to address some security issues.
The new voice mode is now available, initially in an alpha version for ChatGPT Plus users. This rollout will continue through August, by which time all subscribers to this plan should have full access to the feature.
Users selected to test the new voice mode will receive a notification in the app. Once activated, they will be able to interact with ChatGPT, now powered by GPT-4o, in a more natural manner. A significant improvement over the previous voice mode is the ability to have fluid conversations, including the capability to interrupt and engage in emotional interactions.
From a technical standpoint, the previous voice mode converted speech to text, processed the text with GPT-4, and then converted the response back to speech. With GPT-4o, the processing is direct, resulting in extremely low latency.
The new voice mode is not limited to English. OpenAI has tested this feature in over 45 languages. However, currently, only four voices are available (Juniper, Breeze, Cove, and Ember). The Sky voice, reminiscent of Scarlett Johansson, will not be available due to Johansson's refusal to collaborate after expressing her discomfort with the similarity.
In previous demonstrations, ChatGPT assisted children with their homework and described room contents while OpenAI employees had fluid conversations. These capabilities, powered by GPT-4o's vision functions, will be introduced at a later date.
Page loaded in 22.74 ms