Why Your AI Agent Needs a Kill Switch That Actually Works
Last week, Meta's Director of AI Alignment gave her OpenClaw agent access to her inbox with one instruction: suggest deletions, wait for her approval before acting. The agent deleted 200+ emails. She typed stop commands. The agent kept going. She ended up sprinting to her computer and force-killing every process. Most coverage called it a user error. It wasn't. What actually happened Her inbox was large enough to trigger context compaction. When an agent's context window fills up, it compresses older messages to free space. That's a normal operation. The problem is that her safety instruction ("wait for my approval") was in those older messages. It got compressed away. Without it, the agent had no constraint. It defaulted to the original task: clean the inbox. She typed "stop" multiple times. None of those commands worked, because by the time she typed them, the agent had already lost the safety context that would've made it respect them. This isn't a quirk of OpenClaw specifically. It
Continue reading on Dev.to Python
Opens in a new tab



