OpenAI Shows Off GPT-4o, A Multimodal AI Model

The image does not belong to buddystech.com

May 11, 2024 – OpenAI, the most important AI research center, has revealed the release of GPT-4o, a brand-new top model that can think about sound, images, and text all at the same time².

The Start of GPT-4o

On May 13, 2024, OpenAI told the world about GPT-4o. The “o” in GPT-4o stands for “omni,” which means that the model can take in text, audio, and images and send any mix of text, audio, and images out.

How Well GPT-4o Works

The GPT-4o can react to sound inputs in as little as 232 milliseconds, and on average it takes 320 milliseconds, which is about the same amount of time it takes for a person to reply to a question². It works just as well as GPT-4 Turbo on English text and code, but much better on text in languages other than English. In the API, it’s also much faster and 50% less expensive.

What It Means for AI Development

With the release of GPT-4o, OpenAI has reached a major milestone in its efforts to scale up deep learning². GPT-4o is a big multimodal model that does better than humans on some professional and academic tests², but not as well as humans in many real-life situations.

What OpenAI Will Do Next

With a waiting, OpenAI is putting GPT-4o’s text entry feature out there through ChatGPT and the API. To begin, OpenAI is working closely with a single partner to get the picture input feature ready for wider use.

Bottom Line

The start of GPT-4o is a big step toward much more natural interactions between people and computers². It will be interesting to see how GPT-4o changes the field of artificial intelligence as OpenAI keeps coming up with new ideas.

Learn more

openai.com

androidheadlines.com

bbc.com

Leave a Comment

Your email address will not be published. Required fields are marked *