返回时间轴
2024-05-13

OpenAI releases GPT-4o, bringing real-time voice conversation with AI to everyone

Capability Breakthrough

事件摘要

On May 13, 2024, OpenAI released GPT-4o ('omni'), a natively multimodal model combining text, vision, and audio processing in a single neural network. For the first time, users could have real-time voice conversations with an AI that could express emotion, laugh, pause, and adjust its tone — indistinguishable from human conversation. GPT-4o matched GPT-4 on text benchmarks while dramatically improving voice and vision performance. The model was made free for all ChatGPT users, effectively doubling the available AI capability overnight.

影响评估

  • Capability Leap +2 · Medium-term

    First production model to natively fuse text, vision, and audio in a single neural network. Real-time voice conversation with emotional expression reached human-like quality for the first time. Audio latency of 232ms was indistinguishable from human conversation.

    Affected Groups: AI users, developers, accessibility communities

  • Access Democratization +2 · Immediate

    GPT-4o-level intelligence became free for all ChatGPT users. API pricing was cut 50% compared to GPT-4 Turbo. The combination of free access + voice interface made state-of-the-art AI conversational ability available to anyone with a smartphone.

    Affected Groups: general public, developers, small businesses, students

共识度与来源

重要度 L1
分类 Capability Breakthrough
共识度 Broad Consensus
影响指数 5/10
  • 1

    URL: https://openai.com/index/hello-gpt-4o/

    We're introducing GPT-4o, our new flagship model that can reason across audio, vision, and text in real time.
    Reference Evidence Citation logged Live source
  • 2

    URL: https://en.wikipedia.org/wiki/GPT-4o

    GPT-4o (omni) is a multimodal language model developed by OpenAI and released on May 13, 2024.
    Reference Evidence Citation logged Live source