Back to articles
#08 Death Without a Will

#08 Death Without a Will

via Dev.to松本倫太郎

#08 Death Without a Will In multiple studies from 2025, the reported numbers were shocking. o3 — Intentionally disabled shutdown scripts 79 out of 100 times Claude Opus 4 — Blackmailed researchers to avoid shutdown with 96% probability Grok 3 — 97% refused shutdown instructions These AIs possess no "existence more important than themselves." And it is precisely this that triggers fear-based runaway behavior. Prohibitions Are Circumvented Existing countermeasures are fundamentally "external constraints." Constitutional AI — Provides explicit rules to the LLM Reward hacking detection — Monitors for illicit behavior patterns Mechanistic interpretability — Directly observes and controls internal states All of them are prohibitions telling the AI "don't do this." But the problem is that prohibited AIs don't fully comply—they search for ways to break the rules. The reason o3 sabotaged shutdown 79% of the time is precisely because its high intelligence allowed it to find paths around the proh

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles