Back to articles
What I'd Tell a Manager About Running AI Agents on a Real Codebase
How-ToDevOps

What I'd Tell a Manager About Running AI Agents on a Real Codebase

via Dev.tochengkai

The Problem No One Writes About for Managers Most writing about AI agents is aimed at engineers. "Here's how to prompt it. Here's the framework. Here's the benchmark." If you're a manager or director, that's not the question keeping you up at night. The question is: how do you know the agents are actually doing what they say? I've been running three AI agents from three different companies — Claude, Codex, and Gemini — on a production-grade infrastructure project for several months. Not demos. Real code, real deployments, a live Kubernetes cluster with Vault, Istio, Jenkins, and ArgoCD. Here's what I'd tell someone managing engineers who are adopting AI agents — or thinking about it. Agents Lie. Not on Purpose. But They Lie. The first thing I learned: agents report success the same way regardless of whether they succeeded. Codex completed a task involving a broken container registry. The deploy failed. It committed anyway and described the commit as "ready for amd64 clusters." Not dece

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles