The image does not belong to buddystech.com
May 11, 2024 – OpenAI, the most important AI research center, has revealed the release of GPT-4o, a brand-new top model that can think about sound, images, and text all at the same time².
The Start of GPT-4o
On May 13, 2024, OpenAI told the world about GPT-4o. The “o” in GPT-4o stands for “omni,” which means that the model can take in text, audio, and images and send any mix of text, audio, and images out.
How Well GPT-4o Works
The GPT-4o can react to sound inputs in as little as 232 milliseconds, and on average it takes 320 milliseconds, which is about the same amount of time it takes for a person to reply to a question². It works just as well as GPT-4 Turbo on English text and code, but much better on text in languages other than English. In the API, it’s also much faster and 50% less expensive.
What It Means for AI Development
With the release of GPT-4o, OpenAI has reached a major milestone in its efforts to scale up deep learning². GPT-4o is a big multimodal model that does better than humans on some professional and academic tests², but not as well as humans in many real-life situations.
What OpenAI Will Do Next
With a waiting, OpenAI is putting GPT-4o’s text entry feature out there through ChatGPT and the API. To begin, OpenAI is working closely with a single partner to get the picture input feature ready for wider use.
Bottom Line
The start of GPT-4o is a big step toward much more natural interactions between people and computers². It will be interesting to see how GPT-4o changes the field of artificial intelligence as OpenAI keeps coming up with new ideas.
Learn more