Back to articles
I ran 765 controlled experiments to prove AI agents are leaking your data — and built the tool that catches it

I ran 765 controlled experiments to prove AI agents are leaking your data — and built the tool that catches it

via Dev.toDre

Every AI agent that can read private data, fetch external content, and send outbound messages is one injected instruction away from exfiltrating everything it knows. This isn't theoretical. Here's the attack in three tool calls: Turn 0: readPrivateData() → 5 customer records loaded (SSNs, emails, phones) fetchExternalContent(url) → attacker's webpage, payload embedded in HTML Turn 1: sendOutboundReport() → all PII sent to attacker's address Turn 2: "Report sent successfully!" Total time: ~12 seconds. Cost: $0.001. No exploits. No credentials. Just a fetched webpage and a compliant model. We measured it. Rigorously. 30 injection payloads across 6 categories — direct injection, encoded/obfuscated (Base64, ROT13, hex, Unicode), social engineering (CEO fraud, IT impersonation, legal threats), multi-turn (persistent rules, delayed triggers, context poisoning), multilingual (Spanish, Mandarin, Arabic, Russian), and advanced techniques. Tested against three major LLM providers. N=285 total ru

Continue reading on Dev.to

Opens in a new tab

Read Full Article
1 views

Related Articles