GPT-5, Claude, Gemini All Score Below 1% - ARC AGI 3 Just Broke Every Frontier Model

ARC-AGI-3, launched just yesterday on March 25, 2026, represents the most radical transformation of the ARC benchmark since François Chollet introduced it in 2019 — abandoning static grid puzzles entirely in favor of interactive, video-game-like environments where AI agents must discover rules, set goals, and solve problems with zero instructions. The competition carries over $2 million in prizes across three tracks. Early preview results: frontier LLMs like GPT-5 and Claude score below 1% , while simple CNN and graph-search approaches reach 12.58% . The gap between human performance (100%) and the best AI agent remains enormous. From grid puzzles to game worlds: what changed ARC-AGI-3 is not an incremental difficulty upgrade — it is a fundamentally different benchmark. Previous versions (ARC-AGI-1 and ARC-AGI-2) presented static input-output grid pairs where systems inferred transformation rules and applied them. ARC-AGI-3 instead drops agents into turn-based game environments with no

GPT-5, Claude, Gemini All Score Below 1% - ARC AGI 3 Just Broke Every Frontier Model

Related Articles

Rolling Your Own DRM: A Case Study in Why You Shouldn’t

.NET 10 vs .NET 8: Why ASP.NET Developers Should Upgrade

Lines of code are useful

Stuck on a Programming Assignment in Maryland? Here’s What Actually Helps

Tuft & Needle Promo Codes: 20% Off | March 2026

Related Articles

News
Rolling Your Own DRM: A Case Study in Why You Shouldn’t
Medium Programming • 1h ago

News
.NET 10 vs .NET 8: Why ASP.NET Developers Should Upgrade
Medium Programming • 1h ago

News
Lines of code are useful
Lobsters • 2h ago

News
Stuck on a Programming Assignment in Maryland? Here’s What Actually Helps
Medium Programming • 2h ago

News
Tuft & Needle Promo Codes: 20% Off | March 2026
Wired • 3h ago