When More Examples Make Your LLM Worse: Discovering Few-Shot Collapse

Here's something everyone agrees on about few-shot prompting: give the model more examples, it performs better. I believed that too. Then I measured it. So I built AdaptGauge , an open-source tool that measures how efficiently LLMs learn from few-shot examples. What I tested I evaluated eight models across four tasks designed to mirror real business scenarios, at shot counts of 0, 1, 2, 4, and 8: Classification — Categorize customer support inquiries into one of 8 categories (billing, technical support, returns, etc.) Code Fix — Identify and fix bugs in short Python functions (off-by-one errors, missing edge cases) Summarization — Extract key points from Japanese news articles into bullet-point summaries Route Optimization — Calculate optimal delivery routes across multiple destinations with time windows and fuel costs Models tested: Cloud APIs : Claude Haiku 4.5, Claude Opus 4.5, Gemini 2.5 Flash, Gemini 3 Flash, Gemini 3 Pro Local models : Gemma 3 27B, GPT-OSS 120B, Qwen3-VL 8B For e

When More Examples Make Your LLM Worse: Discovering Few-Shot Collapse

Related Articles

Clean Code Principles Every Software Engineer Should Follow

The Real Cost of Abstractions in .NET

Stop Learning Frameworks — You’re Wasting Your Time

How to Self-Host n8n in 2026: VPS vs Managed Hosting (Full Comparison)

I Built a Mac App to Fix Android File Transfer — Here’s What I Learned

Related Articles

How-To
Clean Code Principles Every Software Engineer Should Follow
Medium Programming • 1d ago

How-To
The Real Cost of Abstractions in .NET
Medium Programming • 1d ago

How-To
Stop Learning Frameworks — You’re Wasting Your Time
Medium Programming • 1d ago

How-To
How to Self-Host n8n in 2026: VPS vs Managed Hosting (Full Comparison)
Dev.to • 1d ago

How-To
I Built a Mac App to Fix Android File Transfer — Here’s What I Learned
Medium Programming • 1d ago