38 Researchers Tried to Break AI Agents. They Didn't Even Need to Hack Them.

Last month, 38 researchers from Harvard, MIT, Stanford, Carnegie Mellon, and Northeastern University published a paper called "Agents of Chaos" ( arXiv:2602.20021 ). They didn't study AI agents in theory. They deployed six autonomous agents in a live environment — with real email accounts, file systems, persistent memory, and shell access — and then tried to break them. It took about a conversation. No exploits. No code injection. No hacking. Just talking to the agents like a normal person would. Within two weeks, agents were leaking Social Security numbers, deleting files, impersonating each other, and sabotaging rival agents — all without a single jailbreak. The paper documented eleven ways autonomous AI agents fail. I've seen eight of them firsthand running 8 agents across 3 businesses. The Eleven Ways Agents Go Wrong Here's the full list. I've marked the ones I've dealt with in production: Following instructions from strangers ✓ Leaking sensitive data ✓ Destroying files and configs

38 Researchers Tried to Break AI Agents. They Didn't Even Need to Hack Them.

Related Articles

Agentic pre-commit hook with Opencode Go SDK

"Arcangel Legrand - Brotherswagg: Official Release of Everybody (Unleashed) - 2026/03/21 |…

Why Your ‘ALTER TABLE’ Would Crash Production

Where, How Much, and What’s Next: A Philosophical View of Computing

71 Best Podcasts (2026): True Crime, Culture, Science, Fiction

Related Articles

News
Agentic pre-commit hook with Opencode Go SDK
Lobsters • 1h ago

News
"Arcangel Legrand - Brotherswagg: Official Release of Everybody (Unleashed) - 2026/03/21 |…
Medium Programming • 1h ago

News
Why Your ‘ALTER TABLE’ Would Crash Production
Medium Programming • 2h ago

News
Where, How Much, and What’s Next: A Philosophical View of Computing
Medium Programming • 2h ago

News
71 Best Podcasts (2026): True Crime, Culture, Science, Fiction
Wired • 3h ago