
Context Windows Are Lying to You: How to Actually Use 128K Tokens
Every model brags about context windows now. 128K tokens. 200K tokens. "Paste your entire codebase!" the marketing says. I tried it. I pasted 80K tokens of a Node.js project into Claude and asked it to find a bug. It found a bug — in a file I didn't care about, while ignoring the actual issue in the file I mentioned. Here's what I learned about context windows the hard way. The Attention Problem Large context windows don't mean the model pays equal attention to everything. Research on "lost in the middle" showed that LLMs disproportionately focus on the beginning and end of the context, with reduced attention in the middle. In practice, this means: File 1 of 50: high attention ✓ Files 2-49: declining attention ✗ File 50: high attention ✓ Your actual question at the end: high attention ✓ So if your bug is in file 27, the model literally pays less attention to it — even though it's "in context." The Cost Problem 128K input tokens on GPT-4o costs about $0.32. That sounds cheap until you'r
Continue reading on Dev.to
Opens in a new tab




