The production disasters we've watched happen, and the habit that would have prevented all of them

The Tuesday the database got smaller A client of ours ran a loyalty program with about 120,000 members. On a Tuesday afternoon their agency pushed a "cleanup migration" to production. The intent was to merge duplicate accounts where the same email had signed up twice with different casing. The script ran, the dashboard was snappier than usual, and someone in the client's marketing team noticed by Wednesday morning that roughly 40,000 members had vanished from the list. The migration had matched on normalized email, yes, but it had also silently deleted the "loser" row instead of merging the points balances first. There was no soft delete. There was no dry run log. The backup was 19 hours old, which meant a full day of new signups and point redemptions was gone by the time anyone restored it. The agency's post-mortem used the word "oversight" four times. The real word is "untested". Nobody had run the script against a production-shaped dataset before Tuesday. The staging DB had 300 rows

The production disasters we've watched happen, and the habit that would have prevented all of them

Related Articles

Elastic tabstops (2006)

A Survey and Taxonomy of Graph Sampling

I developed an app to download media from social media, check it out.

Wastrel milestone: full hoot support, with generational gc as a treat

Environment variables are a legacy mess: Let's dive deep into them

Related Articles

News
Elastic tabstops (2006)
Lobsters • 5h ago

News
A Survey and Taxonomy of Graph Sampling
Dev.to • 6h ago

News
I developed an app to download media from social media, check it out.
Reddit Programming • 9h ago

News
Wastrel milestone: full hoot support, with generational gc as a treat
Lobsters • 10h ago

News
Environment variables are a legacy mess: Let's dive deep into them
Reddit Programming • 10h ago