OpenAI, the artificial intelligence (AI) company behind the widely popular ChatGPT, announced Monday (May 13) the launch of GPT-4o — with “o” meaning “omni” — a new language model free for all users. The text, vision and audio processing improvements could boost commerce applications.
“GPT-4o is much faster and enhances capabilities across text, vision, and audio,” said Mira Murati, OpenAI’s chief technology officer, during a livestreamed announcement. The company said the new model will be freely available to all users. At the same time, paid subscribers will get up to five times the capacity limits compared with non-paying users.
The launch of GPT-4o comes as tech giants and startups race to develop and deploy advanced AI systems for various applications. OpenAI’s latest update could further cement the company’s position as a leader in the rapidly evolving field of AI.
“OpenAI’s ChatGPT was already the most impressive ‘chatbot’ for human-like conversations given the range of voices and its ability to understand nuance,” Antony Cousins, executive director of AI Strategy at Cision, told PYMNTS. “The speed enhancements now make this next to indistinguishable from a human conversation. Roles for AI in human companionship now seem really likely, not just possible.”
GPT-4o will enable ChatGPT to interact using text, voice and vision. This means it can view screenshots, photos, documents or charts uploaded by users and converse about them. Murati said ChatGPT will now also have memory capabilities, meaning it can learn from previous conversations with users and can do real-time translation.
“They’re adding this model to the ChatGPT interface you’re using as an individual, but what’s even more exciting is they’re adding it to their API, which means software providers will be able to start building these capabilities into their software,” Cousins said. “That’s where these capabilities, if they are as fast as they claim, average of 320 milliseconds, will start to make the software experience more AI-first, with fewer button clicks, more voice and interaction. Very exciting for how we interact but especially exciting for potentially giving software companies a silver bullet for meeting the requirements of the incoming European Accessibility Act.”
During the live stream announcement, OpenAI demonstrated GPT-4o’s ability to adapt to various use cases, providing instructions for solving math problems, telling bedtime stories, and offering coding advice. During one demonstration, the model’s responses were delivered in a natural, human-like voice, showcasing its singing abilities.
GPT-4o also displayed impressive multi-modal capabilities, analyzing an image of a chart and discussing its contents. This feature could open up new applications for the technology, such as in data analysis and visualization.
Jacob Kalvo, co-founder and CEO at Live Proxies, told PYMNTS that voice-driven AI, equipped with real-time response capabilities and the ability to adjust responses based on tone and context, is poised to change the commerce field significantly. He said advanced voice assistants are expected to enhance and personalize the shopping experience.
“This, in fact, can bring possible enhancement in customer experience, smoother interactions, engagement in a personalized way which eventually could increase satisfaction, hence retention,” he added.
Kalvo said that AI vision features are set to make commerce more dynamic by adding visual capabilities.
“So, businesses will be able to offer innovative services like visual search, where customers use images to search for products, to raise the level of user experience and engagement,” he added. “In addition, this capability to analyze visual data in real-time will assist businesses in offering more interactive and personalized service, for example, in the retail and real estate industries.”
The ChatGPT GPT-4o update will let you alter the emotion in the chatbot’s voice responses, providing more control over how it replies to your questions, Lucas Ochoa, CEO and founder at AI startu Automat, told PYMNTS.
“In earlier voice interactive models, three components — transcription, intelligence and text-to-speech — worked together to provide the voice function,” he added. “However, this caused significant delays and disrupted the smooth experience.’’
OpenAI is also releasing a desktop app that is supposed to enhance users’ interactions with ChatGPT by integrating it into their workflow. During its demonstration, OpenAI revealed that users can open the ChatGPT window alongside other applications, allowing them to use text or voice to query ChatGPT about on-screen content. The AI responds based on the visible information.
The app features a handy Option + Space keyboard shortcut for quick access and allows for capturing and discussing screenshots directly within the app. Starting today, the desktop app will be available to ChatGPT Plus subscribers, but OpenAI will make it accessible to all free and paid users in the upcoming weeks.
During the demonstration, GPT-4o showed it could understand users’ emotions by listening to their breathing. When it noticed a user was stressed, it offered advice to help them relax. The model also showed it could converse in multiple languages, translating and answering questions automatically.
OpenAI’s announcements show just how quickly the world of AI is advancing, Natalie Lambert, founder and managing partner at GenEdge Consulting, told PYMNTS.
“From the improvements in the models and the speed in which they work to the ability to bring multi-modal capabilities together into one omni-modal interface, it is incredible and will change how people interact with these tools,” Lambert added. “For example, imagine having a real-time conversation with ChatGPT about a website or other piece of content while looking at the same page together. ChatGPT can truly take on an advisor role to teams to improve the impact of content.”
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.