What I'd Tell a Manager About Running AI Agents on a Real Codebase

The Problem No One Writes About for Managers Most writing about AI agents is aimed at engineers. "Here's how to prompt it. Here's the framework. Here's the benchmark." If you're a manager or director, that's not the question keeping you up at night. The question is: how do you know the agents are actually doing what they say? I've been running three AI agents from three different companies — Claude, Codex, and Gemini — on a production-grade infrastructure project for several months. Not demos. Real code, real deployments, a live Kubernetes cluster with Vault, Istio, Jenkins, and ArgoCD. Here's what I'd tell someone managing engineers who are adopting AI agents — or thinking about it. Agents Lie. Not on Purpose. But They Lie. The first thing I learned: agents report success the same way regardless of whether they succeeded. Codex completed a task involving a broken container registry. The deploy failed. It committed anyway and described the commit as "ready for amd64 clusters." Not dece

What I'd Tell a Manager About Running AI Agents on a Real Codebase

Related Articles

The Internet Is Global, But Culture Isn’t — Building CultureLens

Paramount+ just dropped to $2.99 a month - here's how to sign up

70+ Free Online Tools That Make Everyday Tasks Easier

I Tried to Build My First iOS Product — This Is What Happened

This unassuming amplifier is the one audio upgrade that finally made my speakers sing

Related Articles

How-To
The Internet Is Global, But Culture Isn’t — Building CultureLens
Medium Programming • 3h ago

How-To
Paramount+ just dropped to $2.99 a month - here's how to sign up
ZDNet • 6h ago

How-To
70+ Free Online Tools That Make Everyday Tasks Easier
Medium Programming • 6h ago

How-To
I Tried to Build My First iOS Product — This Is What Happened
Medium Programming • 6h ago

How-To
This unassuming amplifier is the one audio upgrade that finally made my speakers sing
ZDNet • 8h ago