GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost OpenAI shipped GPT-5.4 mini and nano on March 17. GitHub Copilot deployed mini within 24 hours. Here is what the benchmarks tell us. The Numbers GPT-5.4 mini scored 54.4% on SWE-Bench Pro . The full GPT-5.4? 57.7%. A 3.3-point gap, down from 12 points last generation. On OSWorld-Verified: mini hit 72.1% . Human baseline: 72.4%. The small model matches human-level computer operation. Benchmark GPT-5.4 Mini GPT-5 mini SWE-Bench Pro 57.7% 54.4% 45.7% OSWorld 75.0% 72.1% 42.0% GPQA Diamond 93.0% 88.0% 81.6% The Subagent Play Designed for multi-agent systems. GPT-5.4 plans. Mini runs parallel subtasks. Nano handles classification at /bin/zsh.20/M tokens. Hebbia CTO reported mini outperformed full GPT-5.4 on task-matched workloads. Pricing Mini: /bin/zsh.75/M input (3x over GPT-5 mini). A 50K-token code review costs /bin/zsh.08 vs /bin/zsh.60+ on flagship. 7-10x cheaper for 94% performance. The catch: small model prices keep risin

GPT-5.4 Mini Matches Human-Level Computer Use at 10% the Cost — Full Benchmark Breakdown

Related Articles

Regex Cheat Sheet with Examples: The Complete 2026 Reference

RFK Jr. has destroyed over a quarter of health dept's expert panels

Sony’s WF-1000XM6 wireless earbuds are on sale for the first time

GL4: The Logical Core for Quaternary Optical Processors

import networkx as nx import numpy as np import matplotlib.pyplot

Related Articles

News
Regex Cheat Sheet with Examples: The Complete 2026 Reference
Dev.to Tutorial • 5h ago

News
RFK Jr. has destroyed over a quarter of health dept's expert panels
Ars Technica • 6h ago

News
Sony’s WF-1000XM6 wireless earbuds are on sale for the first time
The Verge • 7h ago

News
GL4: The Logical Core for Quaternary Optical Processors
Medium Programming • 7h ago

News
import networkx as nx import numpy as np import matplotlib.pyplot
Medium Programming • 8h ago