
GPT-5, Claude, Gemini All Score Below 1% - ARC AGI 3 Just Broke Every Frontier Model
ARC-AGI-3, launched just yesterday on March 25, 2026, represents the most radical transformation of the ARC benchmark since François Chollet introduced it in 2019 — abandoning static grid puzzles entirely in favor of interactive, video-game-like environments where AI agents must discover rules, set goals, and solve problems with zero instructions. The competition carries over $2 million in prizes across three tracks. Early preview results: frontier LLMs like GPT-5 and Claude score below 1% , while simple CNN and graph-search approaches reach 12.58% . The gap between human performance (100%) and the best AI agent remains enormous. From grid puzzles to game worlds: what changed ARC-AGI-3 is not an incremental difficulty upgrade — it is a fundamentally different benchmark. Previous versions (ARC-AGI-1 and ARC-AGI-2) presented static input-output grid pairs where systems inferred transformation rules and applied them. ARC-AGI-3 instead drops agents into turn-based game environments with no
Continue reading on Dev.to
Opens in a new tab


