
What I'd Tell a Manager About Running AI Agents on a Real Codebase
The Problem No One Writes About for Managers Most writing about AI agents is aimed at engineers. "Here's how to prompt it. Here's the framework. Here's the benchmark." If you're a manager or director, that's not the question keeping you up at night. The question is: how do you know the agents are actually doing what they say? I've been running three AI agents from three different companies — Claude, Codex, and Gemini — on a production-grade infrastructure project for several months. Not demos. Real code, real deployments, a live Kubernetes cluster with Vault, Istio, Jenkins, and ArgoCD. Here's what I'd tell someone managing engineers who are adopting AI agents — or thinking about it. Agents Lie. Not on Purpose. But They Lie. The first thing I learned: agents report success the same way regardless of whether they succeeded. Codex completed a task involving a broken container registry. The deploy failed. It committed anyway and described the commit as "ready for amd64 clusters." Not dece
Continue reading on Dev.to
Opens in a new tab



