OpenAI’s new GPT-4o model lets people interact using voice or video in the same model

1 Mins read

GPT-4 offered similar capabilities, giving users multiple ways to interact with OpenAI’s AI offerings. But it siloed them in separate models, leading to longer response times and presumably higher computing costs. GPT-4o has now merged those capabilities into a single model, which Murati called an “omnimodel.” That means faster responses and smoother transitions between tasks, she said.

The result, the company’s demonstration suggests, is a conversational assistant much in the vein of Siri or Alexa but capable of fielding much more complex prompts.

“We’re looking at the future of interaction between ourselves and the machines,” Murati said of the demo. “We think that GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural.”

Barret Zoph and Mark Chen, both researchers at OpenAI, walked through a number of applications for the new model. Most impressive was its facility with live conversation. You could interrupt the model during its responses, and it would stop, listen, and adjust course. 

OpenAI showed off the ability to change the model’s tone, too. Chen asked the model to read a bedtime story “about robots and love,” quickly jumping in to demand a more dramatic voice. The model got progressively more theatrical until Murati demanded that it pivot quickly to a convincing robot voice (which it excelled at). While there were predictably some short pauses during the conversation while the model reasoned through what to say next, it stood out as a remarkably naturally paced AI conversation. 

The model can reason through visual problems in real time as well. Using his phone, Zoph filmed himself writing an algebra equation (3x + 1 = 4) on a sheet of paper, having GPT-4o follow along. He instructed it not to provide answers, but instead to guide him much as a teacher would.

“The first step is to get all the terms with x on one side,” the model said in a friendly tone. “So, what do you think we should do with that plus one?”

GPT-4o will store records of users’ interactions with it, meaning the model “now has a sense of continuity across all your conversations,” according to Murati. Other highlights include live translation, the ability to search through your conversations with the model, and the power to look up information in real time. 

Source link

Related posts

This AI Paper by ByteDance Research Introduces G-DIG: A Gradient-Based Leap Forward in Machine Translation Data Selection

3 Mins read
Machine Translation (MT) is a significant field within Natural Language Processing (NLP) that focuses on automatically translating text from one language to…

A Comprehensive Review of Survey on Efficient Multimodal Large Language Models

4 Mins read
Multimodal large language models (MLLMs) are cutting-edge innovations in artificial intelligence that combine the capabilities of language and vision models to handle…

OLAPH: A Simple and Novel AI Framework that Enables the Improvement of Factuality through Automatic Evaluations

3 Mins read
Large Language Models (LLMs) are stepping into clinical and medical fields as they grow in capability and versatility. These models have a…



Leave a Reply

Your email address will not be published. Required fields are marked *