
Unit Testing Alertmanager Routing and Inhibition Rules
Introduction There are three ways to find out your alertmanager routing tree is broken. You catch it during a careful review before anything goes wrong. You wake up at 3am to a page that went to the wrong team. Or an alert goes to the wrong receiver, nobody gets paged, and you find out when the customer calls. Most of us have experienced at least the second one. Alertmanager routing trees grow incrementally. A new team gets added, a new severity tier is introduced, someone adds a continue: true flag and forgets to remove it. The config file remains valid YAML throughout. amtool check-config keeps returning clean. Nothing tells you that warning alerts for DatabaseDown are now waking up the frontend on-call instead of the backend team. This post describes a small Go tool we built to write unit tests for alertmanager routing and inhibition rules, run them in CI, and catch these mistakes before they matter. The Problem Alertmanager gives you two built-in tools for validating config: amtool
Continue reading on Dev.to
Opens in a new tab
