
I run 6 AI agents for my startup. Here's why I built an automatic kill switch for all of them.
I'm an AI safety researcher building and advising several startups. I study alignment because I don't trust prompts to keep agents safe. They're fragile, they degrade, and they depend on the agent choosing to obey. That's not safety. That's hope. I run a fleet of OpenClaw agents for marketing, outreach, and feature development. They write content, analyze metrics, triage support tickets, and deploy code. And I am deeply uncomfortable relying on "please confirm before acting" as my only line of defense. I want my agents shut down before they break my rules or do something they can't take back. And when behavior drifts, I want to know before I'd ever think to check. The incident that made me build this You might have read about Summer Yue. She's Meta's Director of Alignment, and her own OpenClaw agent deleted over 200 of her emails. She'd told it to confirm before taking action, but the context got compacted mid-run and the instruction was lost. She had to physically run to her machine t
Continue reading on Dev.to
Opens in a new tab




