Building a Production RAG Pipeline That Actually Works: Lessons from DocExtract

The Architecture (and Why It's 3 Services, Not 1) DocExtract is split into three services: an API, a worker, and a frontend. User uploads PDF → API validates and enqueues job (ARQ/Redis) → Worker picks up job asynchronously → chunk + embed → pgvector store → BM25 index built in memory on retrieval → API streams SSE progress to frontend → User queries with natural language → hybrid retrieval → Claude generates answer with citations Why not one FastAPI service? Because document processing is slow (2-8 seconds per page), and you don't want your API workers blocked. The ARQ queue decouples upload from processing, which lets you scale workers independently and gives you a natural retry boundary. The async split also means you can add real-time progress streaming (SSE) to the frontend without any threading complexity - the worker updates job state in Redis, the API polls it, and the frontend gets a 12-step progress bar that actually reflects what's happening. The full system has 1,060 tests

Building a Production RAG Pipeline That Actually Works: Lessons from DocExtract

Related Articles

The LaTeX Compilation Errors That Waste the Most Time (And How to Fix Them Fast)

How to Use @Modifying Annotation in Spring Data JPA (With Examples)

Building Business Credit From Zero: The Exact Steps Nobody Posts Online

Do you want to build a robot snowman?

I Haven’t Written Real Code in 3 Months. My Products Still Ship.

Related Articles

How-To
The LaTeX Compilation Errors That Waste the Most Time (And How to Fix Them Fast)
Dev.to Tutorial • 4h ago

How-To
How to Use @Modifying Annotation in Spring Data JPA (With Examples)
Medium Programming • 5h ago

How-To
Building Business Credit From Zero: The Exact Steps Nobody Posts Online
Dev.to Beginners • 7h ago

How-To
Do you want to build a robot snowman?
TechCrunch • 10h ago

How-To
I Haven’t Written Real Code in 3 Months. My Products Still Ship.
Medium Programming • 13h ago