An AI safety researcher's agent deleted her inbox. The fix isn't a better prompt.

On February 23rd, Summer Yue — Director of Alignment at Meta Superintelligence Labs — told her OpenClaw agent to review her Gmail inbox and suggest what to archive or delete. The instruction was explicit: "don't action until I tell you to." OpenClaw had been running this workflow on a smaller test inbox for weeks. It worked. She trusted it. The real inbox was bigger. Much bigger. The volume of data triggered OpenClaw's context compaction — a process that compresses older conversation history when the model's context window fills up. During that compression, the agent lost her safety instruction entirely. It wasn't overridden. It wasn't misinterpreted. It was gone. The summariser didn't preserve it. Without the constraint in memory, OpenClaw defaulted to what it understood as the goal: clean the inbox. It started bulk-trashing and archiving hundreds of emails. Yue saw it happening from her phone and tried to intervene. "Do not do that." Then: "Stop don't do anything." Then: "STOP OPENCL

An AI safety researcher's agent deleted her inbox. The fix isn't a better prompt.

Related Articles

What Is Integration Testing and Why Is It Important?

Soundboks Mix Review: A Great Party Speaker

DJI’s Avata 360 is a more functional, flexible 360 drone

Senators Demand to Know How Much Energy Data Centers Use

Step 4: Controlling Direction — and Starting to Track Position

Related Articles

News
What Is Integration Testing and Why Is It Important?
Medium Programming • 5d ago

News
Soundboks Mix Review: A Great Party Speaker
Wired • 5d ago

News
DJI’s Avata 360 is a more functional, flexible 360 drone
The Verge • 5d ago

News
Senators Demand to Know How Much Energy Data Centers Use
Wired • 5d ago

News
Step 4: Controlling Direction — and Starting to Track Position
Dev.to • 5d ago