Editor
Jennifer chevron_right
OpenAI has announced what it claims to be the next big leap in its AI tech endeavours. This comes in the form of the latest AI model, dubbed GPT-4o, with the letter “o” standing for omni, representing its ability to accept “any combination of text, audio, and image”. Similarly, it can also output in any combination of the aforementioned media, and “in as little as 232 milliseconds”.
The announcement of ChatGPT-4o contained a conversation between the OpenAI GPT-4o and the speaker. It appears that it will be possible to receive a multiple-sentence answer to a single-line question, as can be seen in chatbots. However, as regards the new model of the text, if the need arises, you can still cut off its rambling, and it will drop the entire remaining script.
There is a naturalness to the voices too, with the intonation changes that make them sound real, and they even pause like people do when speaking. It even understands sarcasm, and OpenAI’s demo also appears to have two GPT-4o instances of having a conversation and an impromptu song after the other, taking turns to sing lines.
The GPT-4o model demonstrates distinctive intonations in the various demos, such as commenting on people's looks via the front-facing camera in a phone. Furthermore, it is designed to teach mathematics and Spanish or simply enable English speakers to communicate with Italians by interpreting their speech.