Back to articles
How-ToSystems

We lost $47K because of a single SQL query (the production disaster that taught me to always check query plans)

via Reddit Programming/u/Plenty-Disaster-1094

3:47 AM. Phone lights up. Production is down. Payment processing failing. API response times: 200ms → 12 seconds. Database CPU: 100%. Connection pool: exhausted. Friday night. Peak traffic. Our biggest sale of the quarter. By the time we recovered: $47,000 in lost revenue. The cause? One line I added two weeks earlier: WHERE user_verified = true Looked fine. Tested in dev (500 rows). Tested in staging (100K rows). Production: 12 million rows. No index on user_verified. Every admin dashboard refresh (every 30 seconds) = full table scan on 12M rows = 8-12 seconds per query = connection pool death = payment failures. The fix was embarrassingly simple: CREATE INDEX idx_user_verified ON users(user_verified); 30 seconds. Query time: 8s → 12ms. What I should have done: - EXPLAIN ANALYZE on every query - Test with production-scale data - Monitor slow queries, not just endpoints - Never deploy database changes before peak traffic The real cost: - $47K direct revenue loss - 3 hours of degraded s

Continue reading on Reddit Programming

Opens in a new tab

Read Full Article
1 views

Related Articles