Why I stopped trusting AI agents and built a security enforcer.

Every tutorial on building AI agents includes some version of this line: "Add a system prompt telling the model not to access sensitive data." I followed that advice for a while. Then I started thinking about what it actually means. You're asking a probabilistic text predictor to enforce a security boundary. The same model that confidently hallucinates API documentation is now your permission system. The same model that gets prompt-injected through a malicious PDF is now your secret redaction layer. That's not security. That's optimism. What actually goes wrong AI agents fail in predictable, documented ways: Tool misuse. The agent calls a tool it shouldn't — because the model inferred it was appropriate, because it was hallucinating, or because an attacker crafted an input that made it seem right. Your system prompt says "don't delete files." The model tries to delete a file anyway. What stops it? Prompt injection through tool outputs. The agent browses a webpage, calls an API, reads a

Why I stopped trusting AI agents and built a security enforcer.

Related Articles

Why this Marshall is the first soundbar I've tested that truly challenges my Sonos Arc Ultra

This App Makes Even the Sketchiest PDF or Word Doc Safe to Open

References: The Alias You Didn’t Know You Needed

Pointers: The Concept Everyone Says Is Hard

Learning a Recurrent Visual Representation for Image Caption Generation

Related Articles

How-To
Why this Marshall is the first soundbar I've tested that truly challenges my Sonos Arc Ultra
ZDNet • 1d ago

How-To
This App Makes Even the Sketchiest PDF or Word Doc Safe to Open
Wired • 1d ago

How-To
References: The Alias You Didn’t Know You Needed
Medium Programming • 1d ago

How-To
Pointers: The Concept Everyone Says Is Hard
Medium Programming • 1d ago

How-To
Learning a Recurrent Visual Representation for Image Caption Generation
Dev.to • 1d ago