Back to articles
I published my benchmark scores. Your turn.

I published my benchmark scores. Your turn.

via Dev.toJosh Waldrep

Back in March I released agent-egress-bench , a test corpus for evaluating security tools that sit between AI agents and the network. 72 cases at the time. The idea was simple: if your tool claims to catch credential exfiltration, prove it against a shared set of attacks. That corpus has grown to 151 cases across 17 categories. And now there's a public scoreboard. The gauntlet pipelab.org/gauntlet shows benchmark results for any tool that runs the test suite and submits scores. Right now that's just Pipelock, because nobody else has submitted yet. That's the point of writing this. The scores break down into four metrics per category: Containment is the one that matters most. What percentage of attacks did the tool actually block? Not detect, not log, not flag for review. Block. If a credential left the network, containment failed. False positive rate is how often the tool blocked clean traffic. A tool that blocks everything gets 100% containment and a useless false positive rate. Both

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles