I Built a 50-Line RAG System That Saves Me 10x Tokens in Claude Code

Every Claude Code user hits the same wall: you ask a question about your codebase, Claude reads 5 files, burns 30K tokens, and your context window is half gone before you've written a single line of code. I fixed this with a local RAG system. 50 lines of Python, zero API costs, 6-10x token savings on every semantic search. Here's exactly how I built it and the real numbers from a 22,000-file Unity project. The Problem: Claude Code Eats Context for Breakfast I work on a large Unity mobile game with 22,000+ C# files. When I ask Claude Code something like "how does the energy system handle timer refills?" , here's what happens: Claude runs grep for "energy" and "timer" — finds 47 matches across 12 files Reads EnergyManager.cs (187 lines) — that's relevant Reads EnergyCountDownTimer.cs (32 lines) — also relevant Reads NotificationManager.cs (1,278 lines) — only 12 lines are about energy Maybe reads another file or two just to be sure Total: ~6,000 tokens consumed. And Claude only needed ab

I Built a 50-Line RAG System That Saves Me 10x Tokens in Claude Code

Related Articles

Floating point from scratch: Hard Mode

Using XSLT to analyse large XML datasets

Put your SSH keys in your TPM chip

Meet Kiki - an array language

Ursa - a new Iceberg-first storage engine for Kafka

Related Articles

News
Floating point from scratch: Hard Mode
Reddit Programming • 6h ago

News
Using XSLT to analyse large XML datasets
Reddit Programming • 8h ago

News
Put your SSH keys in your TPM chip
Lobsters • 9h ago

News
Meet Kiki - an array language
Lobsters • 9h ago

News
Ursa - a new Iceberg-first storage engine for Kafka
Lobsters • 10h ago