OpenAI’s latest upgrade essentially lets users livestream with ChatGPT

As a seasoned crypto investor and tech enthusiast, I must admit that the recent announcement from OpenAI regarding their latest AI model, GPT-4o, has left me utterly captivated. The ability for an AI model to process text, audio, and image inputs in real time is truly a game-changer.


OpenAI, the creators of ChatGPT, have unveiled their newest artificial intelligence (AI) model, named GPT-4o. This advanced AI is designed to engage in more conversational exchanges and mimic human interactions with greater accuracy. Furthermore, it is equipped with the capability to process and react to users’ audio and video inputs in real time.

GPT-4 Omni, demonstrated through a set of releases from the company, assists prospective users in various ways. For instance, it aids in interview preparation by ensuring users appear polished and ready for their interviews. Additionally, it facilitates contacting customer support to secure a new iPhone replacement.

In various demonstrations, ChatGPT has been displayed as capable of exchanging father-friendly puns, instantaneously translating bi-lingual dialogues in real life situations, acting as an impartial arbiter for rock-paper-scissors games among two users, and employing sarcasm when provoked. Moreover, one exhibition presented the interaction between ChatGPT and a user’s new puppy for the very first time.

“Well hello, Bowser! Aren’t you just the most adorable little thing?” the chatbot exclaimed.

Hello there, I’m excited to introduce you to GPT-40, our latest advanced model. As an analyst, I’m thrilled to share that this new flagship of ours is capable of processing information from audio, vision, and text simultaneously in real time. Starting today, text and image inputs are available through our API and ChatGPT. In the near future, voice and video capabilities will also be integrated for a more comprehensive user experience.

— OpenAI (@OpenAI) May 13, 2024

As a crypto investor, I can’t help but be left in awe when I use advanced AI technology. It’s as if I’ve stepped into a sci-fi movie – the capabilities of this tech are still hard to believe, even though it’s a tangible reality. This was expressed eloquently by Sam Altman, the CEO, in his May 13 blog post.

“Getting to human-level response times and expressiveness turns out to be a big change.”

On May 13, OpenAI introduced a text and image-only variant. The complete edition is scheduled for release in the near future, they added in a recent post on X.

As a researcher studying advanced artificial intelligence models, I can tell you that GPT-40 is anticipated to be accessible to all ChatGPT users, including those with free accounts. This accessibility will be facilitated through ChatGPT’s Application Programming Interface (API).

I’ve learned from OpenAI that the “o” in GPT-4o represents the term “omni,” signifying an advancement towards more organic and lifelike human-computer interfaces.

Meet GPT-40, our latest innovation, capable of processing text, audio, and video inputs in real time for advanced reasoning. Its versatility makes it engaging to explore, and represents progress towards more lifelike human-AI and even AI-AI interactions.

— Greg Brockman (@gdb) May 13, 2024

GPT-4’s capability to handle text, audio, and image inputs concurrently represents a significant leap forward compared to OpenAI’s previous tools like ChatGPT-4. In simpler terms, while ChatGPT-4 might struggle with handling multiple tasks and lose valuable information in the process, GPT-4 is adept at managing all three types of input at once.

As a researcher studying advanced language models, I can share that OpenAI reportedly claimed superiority for GPT-4 in the realm of vision and audio comprehension compared to preceding models. This enhancement extends beyond just textual input, allowing for the identification of user emotions and even subtle cues like breathing patterns.

It is also “much faster” and “50% cheaper” than GPT-4 Turbo in OpenAI’s API.

According to OpenAI, the latest AI technology is capable of processing audio commands in just 2.3 seconds on average, and this response time is typically around 3.2 seconds – a speed comparable to normal human conversation.

Read More

2024-05-14 03:40