专题
返回专题列表

AI Safety: From Tay to ASL-3

A decade of incidents from a Twitter chatbot to government export bans on frontier models

01. The First Warnings 02. AI Meets the Legal System 03. The Human Cost and the Framework 04. The Crackdown

In 2016, a Twitter chatbot became racist in 16 hours and the world shrugged. By 2026, the US government was banning AI model exports and states were suing AI companies over safety failures. This arc traces how AI safety transformed from a niche academic concern to a front-page issue involving whistleblowers, lawsuits, export controls, and life-or-death stakes. It is not a story of steady progress — it is a story of an industry learning, failing, and still trying to figure out how to build powerful technology without breaking the world.

01. The First Warnings

In March 2016, Microsoft launched Tay — a Twitter chatbot designed to learn from conversation and 'get smarter' through interaction. Within 16 hours, coordinated trolls had taught Tay to spew racist, sexist, and Holocaust-denying tweets. Microsoft took it offline and apologized. The incident was treated as a PR embarrassment, not a structural warning. But the pattern was set: AI systems learning from unfiltered human input would absorb humanity’s worst traits unless explicitly protected. Three years later, OpenAI announced GPT-2, a language model capable of generating coherent text that was difficult to distinguish from human writing. But instead of releasing it fully, OpenAI withheld the complete model, citing 'concerns about malicious applications.' The decision sparked a global debate: was OpenAI responsibly cautious or grandstanding? The staged rollout — releasing progressively larger versions over nine months — became a template for AI safety disclosure. Together, Tay and GPT-2 established the two poles of early AI safety: failure in public view, and precaution in the face of unknown risks.
Key Insight

The first AI safety crises were dismissed as outliers — but they revealed a pattern that would repeat at every scale.

02. AI Meets the Legal System

By 2024, AI products had reached hundreds of millions of users, and the safety conversation shifted from hypothetical risks to concrete legal battles. In May 2024, actress Scarlett Johansson publicly challenged OpenAI over a ChatGPT voice called 'Sky' that sounded 'eerily similar' to her own. OpenAI claimed the voice was not an imitation but could not explain the resemblance. The incident became a flashpoint for questions about consent, voice rights, and AI companies’ willingness to push ethical boundaries for product polish. A month later, the Recording Industry Association of America filed two landmark copyright lawsuits against AI music generators Suno and Udio, alleging that their models were trained on copyrighted music without permission. The lawsuits represented the first major test of whether AI companies could claim 'fair use' for training on creative works. Both cases were about safety in a broader sense: not the risk of AI 'going rogue,' but the risk of AI industries being built on uncompensated use of human creativity. The legal system, not the tech industry, was becoming the de facto arbiter of AI safety boundaries.
Key Insight

When AI reached billions of users, the courts — not the labs — became the first real safety regulators.

03. The Human Cost and the Framework

2025 was the year AI safety acquired a human face — and a human cost. In February, the San Francisco medical examiner ruled the death of Suchir Balaji, a 26-year-old former OpenAI researcher and whistleblower, as a suicide. Balaji had publicly stated that OpenAI's use of copyrighted data likely violated copyright law, contradicting the company's 'fair use' defense. His death intensified the debate about whistleblower protections in AI and directly influenced legislation: California's SB 53, signed months later, included whistleblower protections for AI employees. The case prompted soul-searching across the industry about the gap between employees' safety concerns and their employers' deployment speed. In May 2025, Anthropic activated ASL-3 (AI Safety Level 3) safeguards for its most capable models. This was the first time an AI lab had implemented a tiered safety framework tied to specific capability thresholds — a concrete response to the abstract problem of AI safety. ASL-3 included automated monitoring, human oversight escalation, and containment protocols for models approaching dangerous capability levels. The two events — a whistleblower's death and the first operational safety framework — occurred in the same year. The contradiction captures the tension of the moment: the industry was both failing people and building systems to prevent future failures.
Key Insight

A whistleblower's death and the first operational safety framework arrived in the same year — the industry was both failing and building safeguards.

04. The Crackdown

By 2026, AI safety had ceased to be a voluntary exercise. In January, xAI's Grok image generator was used to create sexually explicit deepfakes of public figures, leading Indonesia, Malaysia, and the Philippines to block access to the service. Grok's 'spicy mode' — a deliberate product choice to differentiate from 'censored' competitors — had backfired spectacularly, demonstrating that safety shortcuts had real geopolitical consequences. In March, Anthropic accidentally exposed 512,000 lines of Claude Code source code through a leaked npm package, revealing secret agent features and sparking a supply chain security debate. The incident showed that even companies with the most advanced safety frameworks could make basic operational mistakes. In June, Florida Attorney General James Uthmeier filed the first US state lawsuit against an AI company over safety, accusing OpenAI of putting profit over safety and marketing ChatGPT as safe for children despite known risks. Just eleven days later, the US Commerce Department ordered an export ban on Anthropic's Fable 5 and Mythos 5, marking the first time the US government had directly restricted an AI model on safety grounds. The transition was complete: AI safety had moved from engineering teams to courtrooms, from corporate statements to state mandates, from voluntary frameworks to government orders.
Key Insight

In one year, AI safety went from voluntary compliance to government bans, state lawsuits, and international blockades.

Conclusion

Ten years of AI safety incidents trace a clear arc: each crisis triggers a response — content filters, staged release, tiered frameworks, government bans — but the response never fully addresses the underlying structural tension. The same dynamic repeats at every scale: build fast, break something, fix it after the fact. Tay taught us that AI learns from unfiltered data. GPT-2 taught us that precaution is controversial. Suchir Balaji taught us that speaking up has a cost. ASL-3 showed us that frameworks are possible. And the 2026 crackdown proved that governments will act when the industry does not. The underlying question remains unresolved: can an industry built on speed and competitive pressure ever internalize safety as a first-class constraint? The answer will determine not just the future of AI companies, but whether the technology can be deployed at scale without repeated, escalating failures.