OpenAI unveils GPT-4o AI model with new voice and vision capabilities


Devdiscourse News Desk | California | Updated: 14-05-2024 11:34 IST | Created: 14-05-2024 10:34 IST
OpenAI unveils GPT-4o AI model with new voice and vision capabilities
Image Credit: Twitter (@OpenAI)

OpenAI has unveiled GPT-4o, its new flagship AI model that can reason across audio, vision, and text in real-time.

GPT-4o (o stands for omni) accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. Notable, the new model can respond to audio prompts in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.

GPT-4o maintains the high performance of its predecessor, GPT-4 Turbo, especially in processing English and coding languages, and shows marked improvements in handling non-English text. It is also significantly faster and 50% less costly to use via its API. Notably, GPT-4o exhibits enhanced capabilities in understanding audio and visual content compared to previous models.

Until now, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. GPT-4o simplifies this by consolidating the functions of multiple models into one - one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio.

The rollout of GPT-4o has already begun on ChatGPT, available to both free and Plus tier users, the latter of whom will enjoy up to 5x the usual message limits. An enterprise version will be released soon. Additionally, a new alpha version of Voice Mode featuring GPT-4o will be available to ChatGPT Plus users in the coming weeks.

In parallel, OpenAI is launching a new ChatGPT desktop app for macOS that integrates seamlessly into anything you're doing on your computer. This app enables users to pose questions, take and discuss screenshots directly within the interface - all with a simple keyboard shortcut (Option + Space).

The new macOS app is rolling out to Plus users starting Monday, and will be made more broadly available in the coming weeks. OpenAI also plans to launch a Windows version later this year. 

Give Feedback