
We built a test corpus for AI agent egress security tools
Most AI security benchmarks test whether the model behaves correctly. AgentDojo tests whether the LLM resists prompt injection. InjecAgent measures injection success rates. AgentHarm checks if the model refuses harmful tasks. These are useful. But they all assume the LLM is the last line of defense. It isn't. Or it shouldn't be. Models fail. They get tricked by prompt injection, they follow tool-poisoned instructions, they leak secrets when asked nicely enough. That's why security tools exist between the agent and the network: proxies, firewalls, MCP wrappers that inspect traffic before it leaves. But there was no standard way to test those tools. Every vendor tested against their own internal cases. No shared corpus, no common scoring, no way to compare coverage across categories. So we built one. What's in the corpus agent-egress-bench is 72 test cases across 8 categories: Category Cases What it tests URL DLP 14 Secrets in query strings, encoded paths, high-entropy subdomains, SSRF R
Continue reading on Dev.to
Opens in a new tab



