Cert-gating every tool call: zero-trust for AI agents

Two days ago, Anthropic launched Managed Agents — a hosted runtime where tool execution runs in per-session sandboxes with always_ask permission policies that route sensitive tool calls through a human approval step. It is a real improvement over the previous status quo. It also catches roughly the same fraction of real attacks that a string allowlist catches, and for the same reason: the gate is checking the surface form of a tool call, not the provenance of the inputs that shaped it. A prompt injection that arrived via a fetched webpage and got reformulated into a bash command does not look like "suspicious input" at the point of the permission prompt. It looks like a normal tool call that the user is being asked to approve. The gap between an LLM's stated intent and subprocess.run is where agent security actually fails. Most agent frameworks address this with "guardrails" -- prompt-level classifiers that try to catch bad instructions before they reach execution. That is not security

Cert-gating every tool call: zero-trust for AI agents

Related Articles

Floating point from scratch: Hard Mode

Using XSLT to analyse large XML datasets

Put your SSH keys in your TPM chip

Meet Kiki - an array language

Ursa - a new Iceberg-first storage engine for Kafka

Related Articles

News
Floating point from scratch: Hard Mode
Reddit Programming • 6h ago

News
Using XSLT to analyse large XML datasets
Reddit Programming • 8h ago

News
Put your SSH keys in your TPM chip
Lobsters • 9h ago

News
Meet Kiki - an array language
Lobsters • 9h ago

News
Ursa - a new Iceberg-first storage engine for Kafka
Lobsters • 10h ago