
What Actually Breaks When You Put RAG in Production
Most RAG tutorials show you how to split documents, embed them, and query a vector store. That part works in a weekend. The part that takes weeks is everything that breaks once real users hit it. I've built RAG systems for code review automation, research synthesis, and data extraction pipelines. Here's what I wish someone had told me before the first deployment. 1. Chunking Strategy Is Your Biggest Lever The default "split at 500 tokens with 50 token overlap" works for blog posts. It falls apart on structured data. For code: chunk by function/class boundaries, not token count. A function split across two chunks loses its meaning in both. AST-aware chunking is worth the complexity. For legal/financial docs: chunk by section headers. A clause that spans two chunks will be retrieved partially, and partial legal text is worse than no text. For conversations/logs: chunk by turn or time window. Overlapping chunks cause duplicate retrieval that confuses the synthesis step. The pattern : matc
Continue reading on Dev.to Python
Opens in a new tab


