Context Windows Are Lying to You: How to Actually Use 128K Tokens

Every model brags about context windows now. 128K tokens. 200K tokens. "Paste your entire codebase!" the marketing says. I tried it. I pasted 80K tokens of a Node.js project into Claude and asked it to find a bug. It found a bug — in a file I didn't care about, while ignoring the actual issue in the file I mentioned. Here's what I learned about context windows the hard way. The Attention Problem Large context windows don't mean the model pays equal attention to everything. Research on "lost in the middle" showed that LLMs disproportionately focus on the beginning and end of the context, with reduced attention in the middle. In practice, this means: File 1 of 50: high attention ✓ Files 2-49: declining attention ✗ File 50: high attention ✓ Your actual question at the end: high attention ✓ So if your bug is in file 27, the model literally pays less attention to it — even though it's "in context." The Cost Problem 128K input tokens on GPT-4o costs about $0.32. That sounds cheap until you'r

Context Windows Are Lying to You: How to Actually Use 128K Tokens

Related Articles

This is the lowest price on a 64GB RAM kit I've seen in months

What Is Computer Science? (Learn This Before It’s Too Late)

How to Build Your Own Claude Code Skill

how to make programming terrible for everyone

Rob Pike’s 5 Rules: The Secret to Building Systems That Actually Survive Production

Related Articles

How-To
This is the lowest price on a 64GB RAM kit I've seen in months
ZDNet • 7h ago

How-To
What Is Computer Science? (Learn This Before It’s Too Late)
Medium Programming • 7h ago

How-To
How to Build Your Own Claude Code Skill
FreeCodeCamp • 8h ago

How-To
how to make programming terrible for everyone
Lobsters • 9h ago

How-To
Rob Pike’s 5 Rules: The Secret to Building Systems That Actually Survive Production
Medium Programming • 9h ago