
Why Our Node Backup's Diff Detection Was Completely Broken — Root Cause Analysis
Why Our Node Backup's Diff Detection Was Completely Broken — Root Cause Analysis The Symptom The system runs automated backups across four nodes every 2 hours. By design, backups should only run when files actually change — diff detection is the core mechanism for saving storage and I/O. But monitoring revealed an anomalous pattern: 07:02 02_node backup complete (9/10 files) 09:02 02_node backup complete (9/10 files) 11:00 02_node backup complete (9/10 files) 13:00 02_node backup complete (9/10 files) 15:00 02_node backup complete (9/10 files) ... Every single backup cycle, nodes 02, 03, and 04 reported "changes detected," with identical file counts (9/10, 9/10, 8/10). Node 01 (local) behaved normally — it skipped when there were no changes. Something was clearly wrong. There's no way that exactly the same number of files changes on each remote node every 2 hours. Root Cause Inspecting file_hashes.json — the core data file for diff detection — revealed the issue: { "02_node" : { "/path
Continue reading on Dev.to DevOps
Opens in a new tab


