
To Embed or Not to Embed? That Is the Question.
To Embed or Not to Embed? That Is the Question In a series of stories about my grammar RAG assistant BookMind and it pissed me off again. Student asked: “ Explain the Past Simple tense. ” The system gave a decent explanation. Then the student said: “ Give me an exercise on this topic. ” Instead of pulling an exercise from the same unit, the model brought something from a completely different section. The conversation broke. That was the moment I finally added a proper reranker. What changed in the pipeline # Stage 1: Hybrid retrieval (25 candidates) candidates = retriever . invoke ( question ) # Stage 2: Cross-Encoder reranking scores = CrossEncoder ( ' cross-encoder/ms-marco-MiniLM-L-6-v2 ' ) \ . predict ([[ question , doc . page_content ] for doc in candidates ]) # Stage 3: Only the best 5 go to the LLM final_context = [ doc for _ , doc in sorted ( zip ( scores , candidates ), reverse = True )][: 5 ] Real conversation after adding reranker Student asks for the rule → system correctly
Continue reading on Dev.to
Opens in a new tab



