2024-09-12

OpenAI releases o1, the first reasoning model that 'thinks before it answers'

Capability Breakthrough

事件摘要

On September 12, 2024, OpenAI released o1 (codenamed 'Strawberry'), a large language model trained to spend more time 'thinking' before generating responses — using a chain-of-thought reasoning process internally. o1 scored in the 89th percentile on the International Mathematics Olympiad qualifier and reached PhD-level accuracy on physics, chemistry, and biology benchmarks. It demonstrated that scaling compute at inference time — not just training time — could unlock new capabilities, opening a second axis of AI progress.

影响评估

Capability Leap +3 · Long-term

Introduced test-time compute scaling as a new axis of AI progress. o1 achieved PhD-level accuracy on science benchmarks (GPQA Diamond) and solved 12/15 AIME math problems vs. GPT-4o's 1.8/15. The chain-of-thought reasoning approach generalized across chemistry, physics, biology, math, and coding — matching or exceeding expert human performance on hard reasoning tasks.

Affected Groups: AI researchers, mathematicians, scientists, software engineers
Paradigm Shift +3 · Long-term

Changed the AI field's understanding of scaling from a single axis (training compute) to two axes (training + inference compute). The concept of 'reasoning models' became a new category. Labs including Google, Anthropic, and DeepSeek all launched their own reasoning models in response. The idea that giving a model more time to 'think' improves performance had profound implications for AI system design, pricing, and capability forecasting.

Affected Groups: entire AI field, researchers, AI companies, policymakers
Risk Creation -1 · Medium-term

Hidden chain-of-thought reasoning raised transparency concerns — users could see the model's output but not its reasoning process. The model's improved resistance to jailbreaking was positive for safety, but the potential for deceptive reasoning (the model could 'think' harmful things without revealing them) created new AI governance challenges.

Affected Groups: AI safety researchers, policymakers, ethicists, general public

共识度与来源

重要度 L2

分类 Capability Breakthrough

共识度 Broad Consensus

影响指数 7/10

1

Introducing OpenAI o1 — OpenAI

URL: https://openai.com/index/introducing-openai-o1-preview/

We've developed a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models.

Reference Evidence Citation logged Live source
2

OpenAI releases o1, its first model with reasoning abilities — The Verge

URL: https://www.theverge.com/2024/9/12/24242439/openai-o1-model-reasoning-strawberry-chatgpt

OpenAI releases o1, its first model with 'reasoning' abilities.

News Report Citation logged Live source

事件摘要

影响评估

共识度与来源

关联事件

"Attention Is All You Need" — the Transformer architecture is born

OpenAI releases GPT-4, the first multimodal large language model

OpenAI releases GPT-4o, bringing real-time voice conversation with AI to everyone

Google launches Gemini 2.0 with Project Mariner, unveiling its vision for an AI agent ecosystem

DeepSeek releases R1, an open-source reasoning model that shocks global markets

OpenAI launches Operator, the first AI agent that can browse the web autonomously

Google DeepMind's Gemini Deep Think achieves gold medal at the International Mathematical Olympiad

OpenAI releases GPT-5, unifying reasoning, multimodality, and task execution in a single system