Why I debug my RAG pipeline stage by stage, not end to end

The problem with end-to-end RAG eval I had a working document retrieval pipeline. Fixed-size chunking, TF-IDF embeddings, FAISS index. Recall@10 was 0.82 on SciFact. Good enough. Then I made one change: I swapped fixed-size chunking for sentence-based chunking. Recall dropped to 0.68. My first instinct was to roll back. But I wanted to understand why . End-to-end eval only told me "retrieval is worse." It couldn't tell me which stage was responsible. The debugging approach I restructured the pipeline so each stage can be evaluated independently. The pipeline is expressed as a string feature chain: from mloda.user import mlodaAPI , PluginCollector # The full pipeline: each __ is a stage boundary results = mlodaAPI . run_all ( features = [ " docs__pii_redacted__chunked__deduped__embedded " ], ... ) Stop at chunking? "docs__pii_redacted__chunked" . Skip dedup? "docs__pii_redacted__chunked__embedded" . Add evaluation? "docs__pii_redacted__chunked__deduped__embedded__evaluation" . Each stag

Why I debug my RAG pipeline stage by stage, not end to end

Related Articles

I developed an app to download media from social media, check it out.

Wastrel milestone: full hoot support, with generational gc as a treat

Environment variables are a legacy mess: Let's dive deep into them

How NASA Built Artemis II’s Fault-Tolerant Computer

But what about K?

Related Articles

News
I developed an app to download media from social media, check it out.
Reddit Programming • 3h ago

News
Wastrel milestone: full hoot support, with generational gc as a treat
Lobsters • 3h ago

News
Environment variables are a legacy mess: Let's dive deep into them
Reddit Programming • 4h ago

News
How NASA Built Artemis II’s Fault-Tolerant Computer
Reddit Programming • 5h ago

News
But what about K?
Lobsters • 6h ago