The Model Isn't the Bottleneck — Your Prompt Structure Is

The Experiment Chris Laub ( @ChrisLaubAI ) ran an experiment that should change how you think about model selection. He built the same application five times — once with each major LLM — and tested five different prompt formatting styles across all of them. The top scores by model (best prompt style for each): Model Best Score Best Format Claude 87 XML GPT-4 71 Markdown Grok 68 — Gemini 64 — DeepSeek 52 — Claude with XML prompts dominated. But here's the more interesting finding: Claude scored 89 with Markdown prompts too . The model was strong regardless of format — but every other model showed dramatic swings depending on prompt structure. The Real Takeaway: Structure > Model The gap between Claude's best and DeepSeek's best is 35 points. That's a model gap, and it's real. But look at it from a different angle: for several models, the gap between their best and worst prompt style was comparable. Changing how you structure your prompt can matter as much as changing which model you use

The Model Isn't the Bottleneck — Your Prompt Structure Is

Related Articles

Why You Start Projects but Never Finish Them

FedEx chooses partnerships over proprietary tech for its automation strategy

Software You Can Love 2026 tickets are on sale

The Subprime Technical Debt Crisis

“It Worked on My Machine” — Until It Reached Production

Related Articles

News
Why You Start Projects but Never Finish Them
Medium Programming • 2h ago

News
FedEx chooses partnerships over proprietary tech for its automation strategy
TechCrunch • 2h ago

News
Software You Can Love 2026 tickets are on sale
Lobsters • 2h ago

News
The Subprime Technical Debt Crisis
Lobsters • 3h ago

News
“It Worked on My Machine” — Until It Reached Production
Medium Programming • 3h ago