Back to articles
I kept getting blamed for outages that weren't mine. So I built a tool to fight back

I kept getting blamed for outages that weren't mine. So I built a tool to fight back

via Dev.to WebdevOleg Glybchenko

The villain origin story December 2024. 11 PM. I'm on the couch. Phone buzzes. "Hey, the AI feature is broken." I check our dashboards. Everything's green. Our servers are fine. Our database is fine. Our CDN is fine. OpenAI is down. Not us. OpenAI. Their status page? Still showing "All Systems Operational." It took them over an hour to even acknowledge it. By then I'd already gotten 14 messages from users who thought we broke something. Two weeks later — same thing. OpenAI down again. 4+ hours. Same dance. Same blame. I started to notice a pattern. The pattern Every monitoring tool I've ever used — Pingdom, UptimeRobot, Datadog — they all answer the same question: is MY site up? That's... not the question anymore. Your site is up. Your checkout is up. Your auth is up. But Stripe's API is returning 500s, so your checkout silently fails. Twilio is lagging, so your 2FA codes arrive 30 minutes late. OpenAI is down, so every AI feature you've built returns a loading spinner forever. Your in

Continue reading on Dev.to Webdev

Opens in a new tab

Read Full Article
2 views

Related Articles