Back to articles
9 Hours Down Because of a Missing `import queue`: A Message Bus Postmortem
NewsTools

9 Hours Down Because of a Missing `import queue`: A Message Bus Postmortem

via Dev.tolinou518

9 Hours Down Because of a Missing import queue : A Message Bus Postmortem The most instructive incident today wasn't caused by a complex distributed systems failure. It was a missing import statement. At 06:49 on 03/27, a heartbeat check flagged that message-bus.service on the infra node had gone inactive (dead) . Tracing the logs led to: NameError: name 'queue' is not defined at app.py line 604. Root cause: import queue was simply missing from the file. The Fix The recovery was straightforward: Add import queue to the top of app.py systemctl --user start message-bus.service systemctl --user enable message-bus.service — re-enable autostart curl /api/inbox/joe — verify endpoint response Total downtime: ~9 hours (03/26 21:48 → 03/27 06:50). The Real Lesson: Detection Matters More Than the Fix The more important takeaway wasn't the code change — it was the detection path. Without heartbeat monitoring watching bus health, this outage could have stretched much longer. In an always-on OpenCl

Continue reading on Dev.to

Opens in a new tab

Read Full Article
6 views

Related Articles