5 Models, 467 Actions, 1 Winner — What We Learned Comparing LLMs on Real Code Generation

We tested five AI models on the same task 467 times. Each run produced a complete deployable website — not a code snippet, not a function, not a patch. A real site with HTML, CSS, JavaScript, and assets. The question: can cheaper models match Claude Sonnet for production code generation? The short answer is no. The longer answer is more interesting. The Models Five models, spanning a 15x cost range: Model Provider Input/1M Tokens Output/1M Tokens Why We Tested It Claude Sonnet 4.6 OpenRouter $3.00 $15.00 Assumed gold standard Claude Haiku 4.5 OpenRouter/CLI $1.00 $5.00 Same family, lower tier Kimi K2.5 OpenRouter $0.42 $2.20 Moonshot AI's latest DeepSeek V3.2 OpenRouter $0.26 $0.38 Budget option DeepSeek R1 OpenRouter $0.70 $2.50 Reasoning-focused These five represent distinct price tiers and architectural approaches. Sonnet and Haiku share a lineage. Kimi is multimodal. DeepSeek V3.2 optimises for cost. R1 optimises for step-by-step reasoning. The 16-Action Pipeline Each model receive

5 Models, 467 Actions, 1 Winner — What We Learned Comparing LLMs on Real Code Generation

Related Articles

What we’re looking for in Startup Battlefield 2026 and how to put your best application forward

Build Days That Actually Mean Something

I have blogged about the difference between code coverage and test coverage and why it matters to distinguish between these 2.

The origin story of Apple’s long-running relationship with FoxConn

How to Optimize Big Data Platform Costs Across the Data Lifecycle

Related Articles

How-To
What we’re looking for in Startup Battlefield 2026 and how to put your best application forward
TechCrunch • 3h ago

How-To
Build Days That Actually Mean Something
Medium Programming • 4h ago

How-To
I have blogged about the difference between code coverage and test coverage and why it matters to distinguish between these 2.
Dev.to Beginners • 9h ago

How-To
The origin story of Apple’s long-running relationship with FoxConn
The Verge • 9h ago

How-To
How to Optimize Big Data Platform Costs Across the Data Lifecycle
Hackernoon • 9h ago