
Your AI Agents Deserve the Same Ops Treatment as Your Microservices
A few months ago I was looking at how our team was actually running AI agents in production. One was a Python script in a tmux session on someone's laptop. Another was a cron job with no timeout. A third had no cost limits — it had quietly burned through $800 in API calls over a weekend because it got stuck in a loop. None of this would fly for a microservice. We'd never ship a service with no health checks, no resource limits, and no way to roll back a bad deploy. But agents were getting a free pass because they felt different somehow. They're AI, not "real" infrastructure. I don't think that's a good enough reason. The thing is, agents are just workloads Strip away the LLM part and an agent is a long running process that consumes resources, has a health state, needs to scale, and requires configuration management. That's just a service. Kubernetes already knows how to manage services. The missing piece was a way to tell Kubernetes what an agent is — not in terms of CPU and memory, bu
Continue reading on Dev.to
Opens in a new tab




