
I built a context engine that saves Claude Code 73% of its tokens on large codebases
The problem LLM coding agents on large repos burn tokens scanning files. Claude Code on an 829-file codebase consumed 45K tokens just finding the right code. By turn 3 of a conversation, context is gone. Token cost compounds. 20 questions in a session at 45K each is 900K tokens -- nearly the entire 1M window. The agent degrades before your work is done. What Mnemosyne does Sits between your codebase and your LLM. Indexes into SQLite, scores every chunk with 6 retrieval signals (BM25, TF-IDF, symbol search, usage frequency, predictive prefetch, optional dense embeddings), compresses with AST awareness, delivers exactly within your token budget. Zero runtime dependencies. pip install mnemosyne-engine . No API keys, no cloud, no Docker. Works offline. Drop-in integration: add 3 lines to your CLAUDE.md or .cursorrules and the agent queries Mnemosyne before reading files. The benchmarks Include the full benchmark table from the README. Be specific: "Claude Opus 4.6 with 1M context, tested a
Continue reading on Dev.to
Opens in a new tab




