
The Model Isn't the Bottleneck — Your Prompt Structure Is
The Experiment Chris Laub ( @ChrisLaubAI ) ran an experiment that should change how you think about model selection. He built the same application five times — once with each major LLM — and tested five different prompt formatting styles across all of them. The top scores by model (best prompt style for each): Model Best Score Best Format Claude 87 XML GPT-4 71 Markdown Grok 68 — Gemini 64 — DeepSeek 52 — Claude with XML prompts dominated. But here's the more interesting finding: Claude scored 89 with Markdown prompts too . The model was strong regardless of format — but every other model showed dramatic swings depending on prompt structure. The Real Takeaway: Structure > Model The gap between Claude's best and DeepSeek's best is 35 points. That's a model gap, and it's real. But look at it from a different angle: for several models, the gap between their best and worst prompt style was comparable. Changing how you structure your prompt can matter as much as changing which model you use
Continue reading on Dev.to
Opens in a new tab


