OpenAI releases GPT-4o, bringing real-time voice conversation with AI to everyone
事件摘要
On May 13, 2024, OpenAI released GPT-4o ('omni'), a natively multimodal model combining text, vision, and audio processing in a single neural network. For the first time, users could have real-time voice conversations with an AI that could express emotion, laugh, pause, and adjust its tone — indistinguishable from human conversation. GPT-4o matched GPT-4 on text benchmarks while dramatically improving voice and vision performance. The model was made free for all ChatGPT users, effectively doubling the available AI capability overnight.
影响评估
-
Capability Leap +2 · Medium-term
First production model to natively fuse text, vision, and audio in a single neural network. Real-time voice conversation with emotional expression reached human-like quality for the first time. Audio latency of 232ms was indistinguishable from human conversation.
Affected Groups: AI users, developers, accessibility communities
-
Access Democratization +2 · Immediate
GPT-4o-level intelligence became free for all ChatGPT users. API pricing was cut 50% compared to GPT-4 Turbo. The combination of free access + voice interface made state-of-the-art AI conversational ability available to anyone with a smartphone.
Affected Groups: general public, developers, small businesses, students
共识度与来源
-
1
We're introducing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.Reference Evidence Citation logged Live source
-
2
GPT-4o (omni) is a multimodal language model developed by OpenAI and released on May 13, 2024.Reference Evidence Citation logged Live source