
Cert-gating every tool call: zero-trust for AI agents
Two days ago, Anthropic launched Managed Agents — a hosted runtime where tool execution runs in per-session sandboxes with always_ask permission policies that route sensitive tool calls through a human approval step. It is a real improvement over the previous status quo. It also catches roughly the same fraction of real attacks that a string allowlist catches, and for the same reason: the gate is checking the surface form of a tool call, not the provenance of the inputs that shaped it. A prompt injection that arrived via a fetched webpage and got reformulated into a bash command does not look like "suspicious input" at the point of the permission prompt. It looks like a normal tool call that the user is being asked to approve. The gap between an LLM's stated intent and subprocess.run is where agent security actually fails. Most agent frameworks address this with "guardrails" -- prompt-level classifiers that try to catch bad instructions before they reach execution. That is not security
Continue reading on Dev.to
Opens in a new tab
