
GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown
GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost OpenAI shipped GPT-5.4 mini and nano on March 17. GitHub Copilot deployed mini within 24 hours. Here is what the benchmarks tell us. The Numbers GPT-5.4 mini scored 54.4% on SWE-Bench Pro . The full GPT-5.4? 57.7%. A 3.3-point gap, down from 12 points last generation. On OSWorld-Verified: mini hit 72.1% . Human baseline: 72.4%. The small model matches human-level computer operation. Benchmark GPT-5.4 Mini GPT-5 mini SWE-Bench Pro 57.7% 54.4% 45.7% OSWorld 75.0% 72.1% 42.0% GPQA Diamond 93.0% 88.0% 81.6% The Subagent Play Designed for multi-agent systems. GPT-5.4 plans. Mini runs parallel subtasks. Nano handles classification at /bin/zsh.20/M tokens. Hebbia CTO reported mini outperformed full GPT-5.4 on task-matched workloads. Pricing Mini: /bin/zsh.75/M input (3x over GPT-5 mini). A 50K-token code review costs /bin/zsh.08 vs /bin/zsh.60+ on flagship. 7-10x cheaper for 94% performance. The catch: small model prices keep risin
Continue reading on Dev.to Webdev
Opens in a new tab


