
I Built 3 AI Agents. Here's What Broke Each Time.
I built 3 versions of an AI investigation agent. Each one got worse at its job. And that's exactly what was supposed to happen. Version 1 was 94.9% confident in everything it flagged. Impressive on paper. Terrifying in practice, because it was catching patterns that didn't exist. Version 2 dropped to 89% confidence. Better? Actually yes. It stopped hallucinating connections between unrelated transactions. Version 3 landed at 76% confidence with a 23% "uncertain" category. The worst accuracy score. The best actual performance. Here's what changed. I stopped optimizing for confidence and started optimizing for honesty. The agent learned to say "I don't know," and that made everything it DID flag significantly more reliable. The Confidence Paradox In AML (Anti-Money Laundering) compliance, a confident model is a dangerous model. When your agent flags everything at 94.9% certainty, you get two problems: Alert fatigue. Investigators stop trusting the system because it cries wolf constantly.
Continue reading on Dev.to
Opens in a new tab


