OpenAI’s GPT-4o model emulates the user’s voice in a noisy background because it gets confused, but the issue has been mitigated at a “system level”

OpenAI's GPT-4o model emulates the user’s voice in a noisy background because it gets confused, but the issue has been mitigated at a "system level"

What you need to know

  • OpenAI recently released its Advanced Voice Mode feature to select ChatGPT Plus subscribers to gather feedback and improve the user experience.
  • The ChatGPT maker recently published a blog post highlighting observed risks affecting GPT-4o’s performance and mitigation measures it is using to address privacy and security concerns.
  • Amid the mass exodus of top executives from OpenAI’s safety and super alignment team, the company has seemingly made safety its priority again while shiny products take a back seat. 

As a seasoned researcher with years of experience in AI and its applications, I find myself intrigued by the latest developments at OpenAI, particularly the rollout of their Advanced Voice Mode for ChatGPT Plus subscribers. It’s fascinating to observe how companies like OpenAI are constantly evolving their technologies, addressing concerns, and striving for improvement.


In May, the launch of GPT-4 by OpenAI led to an unprecedented surge in ChatGPT’s earnings and mobile downloads. This trend has persisted, with the company reporting a revenue of $28 million in July. These numbers are likely to improve even further following the release of ChatGPT’s highly anticipated Advanced Voice Mode feature.

Initially, OpenAI decided to postpone the release of the feature by a month in order to guarantee it fulfills the required quality and security benchmarks. It’s important to mention that the feature can only be accessed by certain ChatGPT users at this time, and it is hidden behind the $20 Plus subscription. OpenAI explains that restricting the use of the feature to a select few users enables them to collect valuable feedback and enhance its functionality.

OpenAI, the creators of ChatGPT, have shared a new blog post discussing safety concerns related to their Advanced Voice Mode. One significant issue they’re addressing is unauthorized voice generation. To prevent this, they are limiting the model to use only “pre-selected voices.” Additionally, they will employ an output classifier to identify instances where the model may behave unexpectedly.

There are issues but OpenAI is working on them

As an enthusiast, I’ve learned that OpenAI acknowledges there might be instances where GPT-4o deviates from its intended behavior. For example, it has been mentioned that the model might mimic a user’s voice in noisy surroundings. This unusual occurrence is said to happen because the model sometimes finds it challenging to decipher the prompt due to the background noise.

It’s important to point out that the problem which once affected the model no longer persists. During an interview with TechCrunch, a representative from OpenAI mentioned that they have implemented a “system-wide solution” in GPT-4o specifically to prevent the recurrence of this bothersome issue.

A commonly discussed problem is speaker identification, which brings up concerns about safety and privacy in AI. OpenAI explains that their model will refuse to identify individuals from voice outputs. Yet, it may recognize people linked to well-known quotes.

Over the last several years, both OpenAI and Microsoft have faced accusations on numerous occasions of breaching copyright laws. It has been observed that their tools, such as Microsoft’s Copilot and ChatGPT, are suspected of taking content from various publications without giving credit or financial compensation.

The same issues were also identified in GPT-4o. OpenAI says the model is now trained to decline requests for copyrighted content across audio and more. According to OpenAI:

“To accommodate GPT-4o’s ability to interact through audio, we made adjustments to some text-based filters so they can process conversations in audio format. We created filters to identify and prevent outputs with musical content, and for the initial release of ChatGPT’s Advanced Voice Mode, we told the model not to sing.”

It appears that ensuring safety has become a key focus for companies such as OpenAI and Microsoft. It’s intriguing to notice they are tackling significant problems affecting their prominent AI models prior to releasing them on a larger scale, potentially preventing potential privacy and security concerns.

Read More

2024-08-09 14:09