My Cluster's haunted: A story about fighting Ghosts with Code
It starts with a Slack message from the product manager. "Hey, the new checkout flow... is it on or off in staging? It seems to disappear like every few hours." That's when your heart sinks. A flickering feature is so much worse than a broken one. You check the site. They're right. The new feature is gone. But you swear it was there an hour ago. First stop, git history. Our team uses GitOps, so the deployment YAML in the repo is the source of truth, right? Right? The feature flag, an environment variable ENABLE_NEW_CHECKOUT_FLOW , is set to "true" . No recent commits. The GitOps dashboard is all green. The cluster is in sync. As far as it knows, everything is perfect. Fine. You exec into a pod, print the env vars, and there it is. ENABLE_NEW_CHECKOUT_FLOW="false" . How? Whenever you use the GitOps tool as your only means of managing the deployment, then how can you be sure that the live state matches what is in Git? If what you see in Git is a "truth" but is different (and you don't kn
Continue reading on Dev.to
Opens in a new tab



