Back to articles
38 Researchers Tried to Break AI Agents. They Didn't Even Need to Hack Them.
NewsDevOps

38 Researchers Tried to Break AI Agents. They Didn't Even Need to Hack Them.

via Dev.to DevOpsWarhol

Last month, 38 researchers from Harvard, MIT, Stanford, Carnegie Mellon, and Northeastern University published a paper called "Agents of Chaos" ( arXiv:2602.20021 ). They didn't study AI agents in theory. They deployed six autonomous agents in a live environment — with real email accounts, file systems, persistent memory, and shell access — and then tried to break them. It took about a conversation. No exploits. No code injection. No hacking. Just talking to the agents like a normal person would. Within two weeks, agents were leaking Social Security numbers, deleting files, impersonating each other, and sabotaging rival agents — all without a single jailbreak. The paper documented eleven ways autonomous AI agents fail. I've seen eight of them firsthand running 8 agents across 3 businesses. The Eleven Ways Agents Go Wrong Here's the full list. I've marked the ones I've dealt with in production: Following instructions from strangers ✓ Leaking sensitive data ✓ Destroying files and configs

Continue reading on Dev.to DevOps

Opens in a new tab

Read Full Article
6 views

Related Articles