OpenAI has unveiled GPT-4o (GPT-4 Omni, or "O" for short), a model not necessarily "smarter" than GPT-4 but distinguished by its ability to process text, visual, and audio data simultaneously with almost no latency and an incredibly human-like voice. Unlike other chatbots that suffer from high latency, GPT-4o offers quick responses, creating a fluid and natural conversation experience. It also handles interruptions gracefully, pausing its response when spoken over.
GPT-4o's low latency is attributed to its capability to process all three forms of input—text, visual, and audio—simultaneously, without relying on separate models. This integration allows it to provide cohesive responses quickly. OpenAI's CTO Mira Murati highlighted that GPT-4o maintains the intelligence of GPT-4 but operates much faster, enhancing natural and effortless interactions.
The demo showcased Omni's voice capabilities, with the bot responding in a casual, human-like manner, complete with natural pauses and even chuckles, creating an uncanny human quality. However, having two versions of the bot converse with each other somewhat diminished this illusion, revealing a more mechanical nature. Despite some awkward moments, such as coordinating a duet between the bots, GPT-4o's voice capabilities are impressive.
In addition to GPT-4o, OpenAI announced the release of a ChatGPT desktop app for macOS, with a Windows version expected later this year. The app is currently accessible to paid GPT users, with a free version to be launched at a later date. The web version of ChatGPT is already utilizing GPT-4o, and the model will be available with some limitations to free users.
News Source: TechSpot