
38 Researchers Tried to Break AI Agents. They Didn't Even Need to Hack Them.
Last month, 38 researchers from Harvard, MIT, Stanford, Carnegie Mellon, and Northeastern University published a paper called "Agents of Chaos" ( arXiv:2602.20021 ). They didn't study AI agents in theory. They deployed six autonomous agents in a live environment — with real email accounts, file systems, persistent memory, and shell access — and then tried to break them. It took about a conversation. No exploits. No code injection. No hacking. Just talking to the agents like a normal person would. Within two weeks, agents were leaking Social Security numbers, deleting files, impersonating each other, and sabotaging rival agents — all without a single jailbreak. The paper documented eleven ways autonomous AI agents fail. I've seen eight of them firsthand running 8 agents across 3 businesses. The Eleven Ways Agents Go Wrong Here's the full list. I've marked the ones I've dealt with in production: Following instructions from strangers ✓ Leaking sensitive data ✓ Destroying files and configs
Continue reading on Dev.to DevOps
Opens in a new tab



