
One query is never enough: why top RAG systems search three times
LangChain has MultiQueryRetriever . LlamaIndex has SubQuestionQueryEngine . Every serious RAG framework decomposes user questions into multiple search queries before hitting the vector database. Why? Because a single embedding compresses your entire question into one point in vector space. And one point can only land in one neighborhood. Take this question: "How do I fix a slow database connection in my Flask app?" Three concepts, three clusters in embedding space: Database connections - pooling, timeouts, driver configuration Flask-specific patterns - SQLAlchemy setup, app factory patterns, teardown handling Performance diagnostics - profiling, query logging, bottleneck identification Embed the full question, and the resulting vector lands in the "Flask + database" neighborhood. The performance diagnostics cluster is invisible. You get back five results about Flask and database setup, zero about profiling or bottleneck identification. This is not about relationships between entities (
Continue reading on Dev.to
Opens in a new tab



