OpenAI unveils GPT-4o AI model with new voice and vision capabilities

Devdiscourse News Desk | California | Updated: 14-05-2024 11:34 IST | Created: 14-05-2024 10:34 IST

OpenAI has unveiled GPT-4o, its new flagship AI model that can reason across audio, vision, and text in real-time.

GPT-4o (o stands for omni) accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs. Notable, the new model can respond to audio prompts in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.

GPT-4o maintains the high performance of its predecessor, GPT-4 Turbo, especially in processing English and coding languages, and shows marked improvements in handling non-English text. It is also significantly faster and 50% less costly to use via its API. Notably, GPT-4o exhibits enhanced capabilities in understanding audio and visual content compared to previous models.

Until now, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. GPT-4o simplifies this by consolidating the functions of multiple models into one - one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio.

The rollout of GPT-4o has already begun on ChatGPT, available to both free and Plus tier users, the latter of whom will enjoy up to 5x the usual message limits. An enterprise version will be released soon. Additionally, a new alpha version of Voice Mode featuring GPT-4o will be available to ChatGPT Plus users in the coming weeks.

In parallel, OpenAI is launching a new ChatGPT desktop app for macOS that integrates seamlessly into anything you're doing on your computer. This app enables users to pose questions, take and discuss screenshots directly within the interface - all with a simple keyboard shortcut (Option + Space).

The new macOS app is rolling out to Plus users starting Monday, and will be made more broadly available in the coming weeks. OpenAI also plans to launch a Windows version later this year.

Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqNText and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
— OpenAI (@OpenAI) May 13, 2024

OpenAI unveils GPT-4o AI model with new voice and vision capabilities

TRENDING

Drones and AI: Transforming Warfare in Ukraine

Taiwan Leads the Way in Global Sustainability Initiatives

FAA Clears Falcon 9 for Flight Post Anomaly Investigation

UP Deputy CM Accuses SP Chief of Being a Congress 'Pawn'

DevShots

Latest News

Tragedy Strikes: Two Brothers Killed by Wild Elephant in Chhattisgarh

Vijender Singh Advocates for Personal Coaches at Global Events Amidst Growing Debate

Sharad Pawar Strikes Back at Amit Shah's Corruption Allegations

Algeria's Emotional Tribute Highlights Colonial Past at Paris Olympics

OPINION / BLOG / INTERVIEW

Bridging the Gender Gap in Central Asia's Energy Sector

Silent Threats on Our Plates: A Comprehensive Look at Food Contaminants

From Crisis to Clarity: Zimbabwe’s Innovative Approach to Welfare Data

Combatting Common Infections: WHO's New Guidelines on STI Treatments

Connect us on

SECTORS

EDITIONS

OTHER LINKS

OTHER PRODUCTS

CONNECT