Production RAG: Lessons from Real Deployments

Everyone's building RAG (Retrieval-Augmented Generation) systems. Most won't survive production. Here's what works. Why RAG Breaks in Production Tutorials make it easy: chunk, embed, prompt. Real data breaks everything. Common failures: Chunking destroys context — tables and references split across chunks Embedding drift — new docs don't align with old embeddings Retrieval-generation gap — LLM answers confidently from the wrong chunk Patterns That Work Hierarchical Chunking Don't chunk by token count. Use document structure: def smart_chunk ( document ): sections = split_by_headers ( document ) paragraphs = flatten_paragraphs ( sections ) sentences = extract_key_sentences ( paragraphs ) return sections + paragraphs + sentences Re-ranking After Retrieval Vector similarity is a rough filter. Add cross-encoder re-ranking: candidates = vector_store . search ( query , top_k = 20 ) reranked = cross_encoder . rank ( query , candidates ) context = reranked [: 5 ] Citation Tracking Force the mo

Production RAG: Lessons from Real Deployments

Related Articles

Crusoe makes big battery buys for its data centers

What Your Engineering Manager Actually Does All Day

The Lego Game Boy makes for a great gift, and it’s $10 off today

How To Apply Global Filters With EF Core Query Filters

Pokémon Champions is coming to the Nintendo Switch on April 8th

Related Articles

How-To
Crusoe makes big battery buys for its data centers
TechCrunch • 29m ago

How-To
What Your Engineering Manager Actually Does All Day
Medium Programming • 1h ago

How-To
The Lego Game Boy makes for a great gift, and it’s $10 off today
The Verge • 2h ago

How-To
How To Apply Global Filters With EF Core Query Filters
Medium Programming • 3h ago

How-To
Pokémon Champions is coming to the Nintendo Switch on April 8th
The Verge • 5h ago