
We set out to save money on observability. Instead, we (accidentally) rebuilt our incident response workflow.
The Gadget infrastructure team shares how using agents to migrate our observability stack to Grafana and ClickHouse ended up transforming how we handle production debugging and incident response. This post was supposed to be about how we migrated Gadget’s observability stack from Axiom to ClickHouse and Grafana, and saved a bunch of money. We did that, but the big story is actually how the collection of agent skills we built to help us with this migration ended up fundamentally transforming how we approach incident response and production debugging at Gadget. So we will still talk a bit about the migration, what the process looked like and some of the tools we used, but what we really want to share is how agents have become our primary mechanism for incident response. Over the course of a couple of weeks, we went from manually searching through logs and traces and dashboards, to delegating these investigatory tasks to Claude. And it’s all thanks to the agent skills and MCP server we bu
Continue reading on Dev.to DevOps
Opens in a new tab




