
How Kubernetes Drift Detection Saved Us From Infrastructure Chaos
Three months into a production migration, we discovered that 14 of our 47 deployments had quietly drifted from their declared state. Not in a dramatic, pager-firing way. In the slow, invisible way that turns a Tuesday afternoon into a Friday incident. That's the thing about configuration drift. It doesn't announce itself. It accumulates. Here's what happened, what we built to fix it, and why I think most teams are one bad deploy away from the same problem. The Setup We were running a mid-sized Kubernetes cluster across three environments: dev, staging, and production. Standard GitOps workflow. ArgoCD handling deployments. Helm charts checked into Git. Everything was "declarative." Everything was "source-of-truth." Except it wasn't. Engineers were patching things manually under pressure. kubectl edit became a habit. Resource limits got tweaked directly on pods. ConfigMaps were updated in-cluster without touching the repo. Nobody flagged it because nothing broke. The cluster kept humming
Continue reading on Dev.to DevOps
Opens in a new tab



