What to know about the launch of GPT-4o

by Lauren Sforza - 05/13/24 7:23 PM ET

OpenAI on Monday launched its latest artificial intelligence (AI) model, GPT-4o, which promises improvements in its text, vision and audio capabilities.

OpenAI unveiled the model during a live demonstration Monday, with Chief Technology Officer Mira Murati saying it is a “huge step forward with the ease of use” of the system. OpenAI’s newest model launched just one day before Google’s annual developer conference scheduled for Tuesday.

Here’s what to know about the launch of GPT-4o.

Improved voice instruction

Users can now show GPT-4o multiple photos and chat with the model about the uploaded image, according to OpenAI.

This can help students work their way through math problems step by step. One of the demonstrations shown during the launch on Monday walks the users through a simple math problem without giving away any answers.

A separate video posted by online instruction company Khan Academy demonstrates how the new model can help teach students in real time. The student shared his screen with him working through the problem in real time as the model guided him through it.

A faster model with improved capabilities

Murati said Monday that GPT-4o provides “GPT-4 level intelligence” that is faster and improves the system’s capabilities across text, vision and audio.

“This is really shifting the paradigm into the future of collaboration, where this interaction becomes much more natural and far, far easier,” she said.

OpenAI said its new model can “respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds.” It noted that this is about the same amount of time it takes for humans to respond in a conversation.

The new model launched Monday

GPT-4o is available starting Monday to all users of OpenAI’s ChatGPT AI chatbot, including those who are using the free version.

“GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits,” OpenAI wrote in its update Monday.

The new voice mode will come out in the following weeks for ChatGPT Plus users, OpenAI CEO Sam Altman wrote on the social platform X.

The model is ‘natively multimodal’

Altman also posted on X that the model is “natively multimodal,” which means that the model can generate content and understand commands through voice, text or images.

In a separate blog post, he said the new voice and video mode “is the best computer interface” he has ever used.

“It feels like AI from the movies; and it’s still a bit surprising to me that it’s real. Getting to human-level response times and expressiveness turns out to be a big change,” he wrote in Monday’s post.

Tags ChatGPT OpenAI Sam Altman