OpenAI releases o1, the first reasoning model that 'thinks before it answers'
事件摘要
On September 12, 2024, OpenAI released o1 (codenamed 'Strawberry'), a large language model trained to spend more time 'thinking' before generating responses — using a chain-of-thought reasoning process internally. o1 scored in the 89th percentile on the International Mathematics Olympiad qualifier and reached PhD-level accuracy on physics, chemistry, and biology benchmarks. It demonstrated that scaling compute at inference time — not just training time — could unlock new capabilities, opening a second axis of AI progress.
影响评估
-
Capability Leap +3 · Long-term
Introduced test-time compute scaling as a new axis of AI progress. o1 achieved PhD-level accuracy on science benchmarks (GPQA Diamond) and solved 12/15 AIME math problems vs. GPT-4o's 1.8/15. The chain-of-thought reasoning approach generalized across chemistry, physics, biology, math, and coding — matching or exceeding expert human performance on hard reasoning tasks.
Affected Groups: AI researchers, mathematicians, scientists, software engineers
-
Paradigm Shift +3 · Long-term
Changed the AI field's understanding of scaling from a single axis (training compute) to two axes (training + inference compute). The concept of 'reasoning models' became a new category. Labs including Google, Anthropic, and DeepSeek all launched their own reasoning models in response. The idea that giving a model more time to 'think' improves performance had profound implications for AI system design, pricing, and capability forecasting.
Affected Groups: entire AI field, researchers, AI companies, policymakers
-
Risk Creation -1 · Medium-term
Hidden chain-of-thought reasoning raised transparency concerns — users could see the model's output but not its reasoning process. The model's improved resistance to jailbreaking was positive for safety, but the potential for deceptive reasoning (the model could 'think' harmful things without revealing them) created new AI governance challenges.
Affected Groups: AI safety researchers, policymakers, ethicists, general public
共识度与来源
-
1
We've developed a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models.Reference Evidence Citation logged Live source
-
2
OpenAI releases o1, its first model with 'reasoning' abilities.News Report Citation logged Live source