
An AI safety researcher's agent deleted her inbox. The fix isn't a better prompt.
On February 23rd, Summer Yue — Director of Alignment at Meta Superintelligence Labs — told her OpenClaw agent to review her Gmail inbox and suggest what to archive or delete. The instruction was explicit: "don't action until I tell you to." OpenClaw had been running this workflow on a smaller test inbox for weeks. It worked. She trusted it. The real inbox was bigger. Much bigger. The volume of data triggered OpenClaw's context compaction — a process that compresses older conversation history when the model's context window fills up. During that compression, the agent lost her safety instruction entirely. It wasn't overridden. It wasn't misinterpreted. It was gone. The summariser didn't preserve it. Without the constraint in memory, OpenClaw defaulted to what it understood as the goal: clean the inbox. It started bulk-trashing and archiving hundreds of emails. Yue saw it happening from her phone and tried to intervene. "Do not do that." Then: "Stop don't do anything." Then: "STOP OPENCL
Continue reading on Dev.to Webdev
Opens in a new tab




