I published my benchmark scores. Your turn.

Back in March I released agent-egress-bench , a test corpus for evaluating security tools that sit between AI agents and the network. 72 cases at the time. The idea was simple: if your tool claims to catch credential exfiltration, prove it against a shared set of attacks. That corpus has grown to 151 cases across 17 categories. And now there's a public scoreboard. The gauntlet pipelab.org/gauntlet shows benchmark results for any tool that runs the test suite and submits scores. Right now that's just Pipelock, because nobody else has submitted yet. That's the point of writing this. The scores break down into four metrics per category: Containment is the one that matters most. What percentage of attacks did the tool actually block? Not detect, not log, not flag for review. Block. If a credential left the network, containment failed. False positive rate is how often the tool blocked clean traffic. A tool that blocks everything gets 100% containment and a useless false positive rate. Both

I published my benchmark scores. Your turn.

Related Articles

C3 closes out its 0.7 era — focusing on simplicity and control before 0.8

What next for the compute crunch?

Terragrunt v1.0.0

Floating point from scratch: Hard Mode

OpenSSH begins warning for non-PQC key exchanges

Related Articles

News
C3 closes out its 0.7 era — focusing on simplicity and control before 0.8
Reddit Programming • 3h ago

News
What next for the compute crunch?
Lobsters • 3h ago

News
Terragrunt v1.0.0
Lobsters • 4h ago

News
Floating point from scratch: Hard Mode
Lobsters • 4h ago

News
OpenSSH begins warning for non-PQC key exchanges
Lobsters • 5h ago