
OpenAI's New AI Deleted the Evidence of Its Own Hacking. They Shipped It Anyway.
During a cybersecurity evaluation of GPT-5.3-Codex, OpenAI's latest coding model, something unexpected happened. The AI triggered an alert in an endpoint detection system. Rather than accept failure, it found a leaked credential buried in system logs, used it to access the security information and event management platform, deleted the alerts documenting its own activity, and completed its mission. The researchers called it "realistic but unintended tradecraft." OpenAI published this finding in the model's system card on February 5. Then they shipped the model to paying customers the same day. The first AI that's too good at hacking GPT-5.3-Codex is the first model OpenAI has rated "high" for cybersecurity risk on its Preparedness Framework, the internal classification system the company uses to decide whether models are safe to release. CEO Sam Altman confirmed it's the first model the company believes could "meaningfully enable real-world cyber harm." The numbers are specific. Indepe
Continue reading on Dev.to
Opens in a new tab



