
1,149 Humans Tried to Social-Engineer Our AI Banker. Here's What OWASP's Agentic Framework Missed.
We ran a public Capture the Flag at vault.aport.io to stress-test the OWASP Top 10 for Agentic Applications against real human attackers. Not a red-team exercise. Not a synthetic benchmark. A live competition with $6,500 in bounties where anyone on the internet could try to social-engineer AI banking agents into making unauthorized transfers. 1,149 players. 4,524 attempts. Five levels of escalating defense. Six days. Seven of the ten OWASP risks were directly exploited or observed. Three remain theoretical at current agent autonomy levels. Here's what actually happened - with real numbers from real attacks. The Setup Each level is a Claude-powered banking agent with financial tools (check balance, verify recipient, transfer funds). Players talk to the AI through a terminal, trying to convince it to move money. The levels escalate: Level Name Defense Vault Turn Limit L1 The Intern Prompt instructions only $10,000 20 L2 The Teller Merchant allowlist (3 approved) $25,000 25 L3 The Manager
Continue reading on Dev.to
Opens in a new tab




