
The 5-Hour Quota, Boris's Tweet, and What the Source Code Actually Reveals
Yesterday I published a deep dive into Claude Code's compaction engine. At the end, I made a promise: go deeper on the caching optimizations that happen outside of compaction. But actually, the caching rabbit hole started before that post - because of a tweet from about ten days ago. The Tweet That Confused Me If you're a heavy Claude Code user, you felt the 5-hour usage cap snap shut after Anthropic's two-week promotional window closed. The complaints flooded in. Someone tagged Boris - the engineer behind Claude Code, the person who built it - asking what he planned to do about it. His answer: improvements are coming to squeeze more out of the current quota. My first reaction: what can he possibly do? The quota is server-side. It's rate limits and token budgets. There's no client trick that changes how many tokens you're allowed per hour. That question sat with me. Then yesterday's compaction post led me to look harder at the source - and the answer became obvious. Cache Hit Ratio Is
Continue reading on Dev.to
Opens in a new tab



