
Why GitOps Doesn't Work at Scale (and What to Do Instead)
People talk about GitOps like it is the final form of delivery. In real life, it depends a lot on scale. I have spent years helping teams go from one multi-tenant instance to hundreds of single-tenant instances. GitOps was useful early. However, for me at large scale, it became a constant fight. One formula captures it well: P(failure) = 1 - p^n . Where p is the chance each individual change works, and n is how many moving parts you have to coordinate. As n grows, failure risk climbs fast even if each single change is "pretty safe." For example: you are deploying one release to 100 single-tenant customer environments, and each environment sync has a 99% success rate. p = 0.99 (one environment sync succeeds 99% of the time) n = 100 (100 environment syncs in the rollout wave) 1 - 0.99^100 = 0.634 So that rollout has about a 63% chance that at least one customer environment fails to deploy cleanly on the first pass. P(failure) 1.00 ┬ ● │ ● 0.80 │ ● │ ● 0.60 │ ● │ ● 0.40 │ ● │ ● 0.20 │ ● │
Continue reading on Dev.to
Opens in a new tab


