返回时间轴
2024-09-12

OpenAI releases o1, the first reasoning model that 'thinks before it answers'

Capability Breakthrough

事件摘要

On September 12, 2024, OpenAI released o1 (codenamed 'Strawberry'), a large language model trained to spend more time 'thinking' before generating responses — using a chain-of-thought reasoning process internally. o1 scored in the 89th percentile on the International Mathematics Olympiad qualifier and reached PhD-level accuracy on physics, chemistry, and biology benchmarks. It demonstrated that scaling compute at inference time — not just training time — could unlock new capabilities, opening a second axis of AI progress.

影响评估

  • Capability Leap +3 · Long-term

    Introduced test-time compute scaling as a new axis of AI progress. o1 achieved PhD-level accuracy on science benchmarks (GPQA Diamond) and solved 12/15 AIME math problems vs. GPT-4o's 1.8/15. The chain-of-thought reasoning approach generalized across chemistry, physics, biology, math, and coding — matching or exceeding expert human performance on hard reasoning tasks.

    Affected Groups: AI researchers, mathematicians, scientists, software engineers

  • Paradigm Shift +3 · Long-term

    Changed the AI field's understanding of scaling from a single axis (training compute) to two axes (training + inference compute). The concept of 'reasoning models' became a new category. Labs including Google, Anthropic, and DeepSeek all launched their own reasoning models in response. The idea that giving a model more time to 'think' improves performance had profound implications for AI system design, pricing, and capability forecasting.

    Affected Groups: entire AI field, researchers, AI companies, policymakers

  • Risk Creation -1 · Medium-term

    Hidden chain-of-thought reasoning raised transparency concerns — users could see the model's output but not its reasoning process. The model's improved resistance to jailbreaking was positive for safety, but the potential for deceptive reasoning (the model could 'think' harmful things without revealing them) created new AI governance challenges.

    Affected Groups: AI safety researchers, policymakers, ethicists, general public

共识度与来源

重要度 L2
分类 Capability Breakthrough
共识度 Broad Consensus
影响指数 7/10
  • 1

    URL: https://openai.com/index/introducing-openai-o1-preview/

    We've developed a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models.
    Reference Evidence Citation logged Live source
  • 2

    URL: https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt

    OpenAI releases o1, its first model with 'reasoning' abilities.
    News Report Citation logged Live source