2024-05-13

OpenAI releases GPT-4o, bringing real-time voice conversation with AI to everyone

Capability Breakthrough

事件摘要

On May 13, 2024, OpenAI released GPT-4o ('omni'), a natively multimodal model combining text, vision, and audio processing in a single neural network. For the first time, users could have real-time voice conversations with an AI that could express emotion, laugh, pause, and adjust its tone — indistinguishable from human conversation. GPT-4o matched GPT-4 on text benchmarks while dramatically improving voice and vision performance. The model was made free for all ChatGPT users, effectively doubling the available AI capability overnight.

影响评估

Capability Leap +2 · Medium-term

First production model to natively fuse text, vision, and audio in a single neural network. Real-time voice conversation with emotional expression reached human-like quality for the first time. Audio latency of 232ms was indistinguishable from human conversation.

Affected Groups: AI users, developers, accessibility communities
Access Democratization +2 · Immediate

GPT-4o-level intelligence became free for all ChatGPT users. API pricing was cut 50% compared to GPT-4 Turbo. The combination of free access + voice interface made state-of-the-art AI conversational ability available to anyone with a smartphone.

Affected Groups: general public, developers, small businesses, students

共识度与来源

重要度 L1

分类 Capability Breakthrough

共识度 Broad Consensus

影响指数 5/10

1

Hello GPT-4o — OpenAI

URL: https://openai.com/index/hello-gpt-4o/

We're introducing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.

Reference Evidence Citation logged Live source
2

GPT-4o — Wikipedia

URL: https://en.wikipedia.org/wiki/GPT-4o

GPT-4o (omni) is a multimodal language model developed by OpenAI and released on May 13, 2024.

Reference Evidence Citation logged Live source

事件摘要

影响评估

共识度与来源

关联事件

OpenAI launches ChatGPT, bringing AI to 100 million users in two months

OpenAI releases GPT-4, the first multimodal large language model

OpenAI previews Sora, the first text-to-video model that looked like it was working

Google releases Gemini 1.5 Pro with a million-token context window

Scarlett Johansson challenges OpenAI over voice that sounds 'eerily similar' to hers

Apple announces Apple Intelligence at WWDC, bringing AI to over a billion devices

Anthropic releases Claude 3.5 Sonnet, setting a new standard for coding AI

OpenAI releases o1, the first reasoning model that 'thinks before it answers'

OpenAI launches Operator, the first AI agent that can browse the web autonomously

Google launches AI Mode in Search, the biggest change to how search works in 25 years

OpenAI releases GPT-5, unifying reasoning, multimodality, and task execution in a single system