Symbols Not Chunks: 3.9x Less Tokens

AST-Based Retrieval Cuts LLM Code Context 1.6 - 3.9x vs. LangChain RAG on Real Codebases J. Gravelle March 2026 Abstract Large language models (LLMs) consume tokens proportionally to the context they receive. When applied to code understanding tasks, the dominant retrieval strategy --- chunk-based Retrieval-Augmented Generation (RAG) using vector embeddings --- injects substantial irrelevant context, wastes tokens, and frequently delivers fragments that split functions mid-definition. This paper presents an alternative: AST-based symbol retrieval, which uses tree-sitter parsing to extract complete syntactic units (functions, classes, methods) and serves them via deterministic lookup. We benchmark both approaches on three open-source web frameworks (Express.js, FastAPI, Gin) totaling 1,214 files and 1,024,421 baseline tokens. In head-to-head comparison against a naive fixed-chunk RAG pipeline (LangChain + FAISS + MiniLM-L6-v2), AST retrieval uses 1.6--3.9x fewer tokens per query on ever

Symbols Not Chunks: 3.9x Less Tokens

Related Articles

PostGIS Distance Calculations: Why ST_Distance Returns Degrees Instead of Meters

Best Block Blast Solver (2026) Instantly Solve Any Level

Amazon Spring Sale live blog 2026: Breaking discounts on Apple, Dyson, and more

Anthropic Literally Sued the US Defense Department for Banning It While Giving the Contract to…

Here’s what Verge readers are buying during Amazon’s Big Spring Sale

Related Articles

News
PostGIS Distance Calculations: Why ST_Distance Returns Degrees Instead of Meters
Medium Programming • 3h ago

News
Best Block Blast Solver (2026) Instantly Solve Any Level
Medium Programming • 3h ago

News
Amazon Spring Sale live blog 2026: Breaking discounts on Apple, Dyson, and more
ZDNet • 3h ago

News
Anthropic Literally Sued the US Defense Department for Banning It While Giving the Contract to…
Medium Programming • 4h ago

News
Here’s what Verge readers are buying during Amazon’s Big Spring Sale
The Verge • 4h ago