AI

OpenAI’s new GPT-4o model lets people interact using voice or video in the same model

1 Mins read

GPT-4 offered similar capabilities, giving users multiple ways to interact with OpenAI’s AI offerings. But it siloed them in separate models, leading to longer response times and presumably higher computing costs. GPT-4o has now merged those capabilities into a single model, which Murati called an “omnimodel.” That means faster responses and smoother transitions between tasks, she said.

The result, the company’s demonstration suggests, is a conversational assistant much in the vein of Siri or Alexa but capable of fielding much more complex prompts.

“We’re looking at the future of interaction between ourselves and the machines,” Murati said of the demo. “We think that GPT-4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural.”

Barret Zoph and Mark Chen, both researchers at OpenAI, walked through a number of applications for the new model. Most impressive was its facility with live conversation. You could interrupt the model during its responses, and it would stop, listen, and adjust course. 

OpenAI showed off the ability to change the model’s tone, too. Chen asked the model to read a bedtime story “about robots and love,” quickly jumping in to demand a more dramatic voice. The model got progressively more theatrical until Murati demanded that it pivot quickly to a convincing robot voice (which it excelled at). While there were predictably some short pauses during the conversation while the model reasoned through what to say next, it stood out as a remarkably naturally paced AI conversation. 

The model can reason through visual problems in real time as well. Using his phone, Zoph filmed himself writing an algebra equation (3x + 1 = 4) on a sheet of paper, having GPT-4o follow along. He instructed it not to provide answers, but instead to guide him much as a teacher would.

“The first step is to get all the terms with x on one side,” the model said in a friendly tone. “So, what do you think we should do with that plus one?”

GPT-4o will store records of users’ interactions with it, meaning the model “now has a sense of continuity across all your conversations,” according to Murati. Other highlights include live translation, the ability to search through your conversations with the model, and the power to look up information in real time. 


Source link

Related posts
AI

Meta AI Introduces VideoJAM: A Novel AI Framework that Enhances Motion Coherence in AI-Generated Videos

2 Mins read
Despite recent advancements, generative video models still struggle to represent motion realistically. Many existing models focus primarily on pixel-level reconstruction, often leading…
AI

Creating an AI Agent-Based System with LangGraph: Putting a Human in the Loop

4 Mins read
In our previous tutorial, we built an AI agent capable of answering queries by surfing the web and added persistence to maintain…
AI

Meet Crossfire: An Elastic Defense Framework for Graph Neural Networks under Bit Flip Attacks

2 Mins read
Graph Neural Networks (GNNs) have found applications in various domains, such as natural language processing, social network analysis, recommendation systems, etc. Due…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *