Anthropic activates ASL-3 safeguards, establishing the first operational AI safety framework tied to capability thresholds

事件摘要

In May 2025, Anthropic activated ASL-3 (AI Safety Level 3) safeguards for its most capable models, marking the first time an AI lab implemented a tiered safety framework directly tied to capability thresholds. The Responsible Scaling Policy (RSP) v3.0 defined capability thresholds that trigger increasingly stringent safety measures — from monitoring to deployment restrictions. ASL-3's activation set a precedent for how AI labs could operationalize safety research into enforceable internal governance, rather than relying on voluntary commitments.

影响评估

Paradigm Shift +2 · Long-term

First operational AI safety framework tied to concrete capability thresholds. Created a precedent that safety measures should scale with model capabilities. Influenced regulatory frameworks including SB 53 and EU AI Act implementation.

Affected Groups: AI safety researchers, AI labs, policymakers, regulators
Risk Creation -1 · Medium-term

Critics argued thresholds were self-defined and self-enforced, lacking independent oversight. Raised questions about whether AI labs could credibly self-regulate capability thresholds.

Affected Groups: public, ethicists, regulators

共识度与来源

重要度 L1

分类 Capability Breakthrough

共识度 Emerging Consensus

影响指数 5/10

1

Responsible Scaling Policy Version 3.0 — Anthropic

URL: https://www.anthropic.com/news/responsible-scaling-policy-v3

We activated ASL-3 safeguards for relevant models in May 2025 and have been working to improve them ever since.

Reference Evidence Citation logged Live source