
We Traced One Query Through Perplexity’s Entire Stack in Cohort – Here’s What Actually Happens in 3 Seconds
It was about 90 minutes into the session. We’d just finished building a RAG pipeline from scratch in Python the kind where you stare at FAISS indices and embeddings and wonder if you’ll ever actually deploy this in prod. The instructor stopped scrolling through code and looked up. “Alright,” he said. “Let’s stop pretending we’re building search. Let’s trace one live query through Perplexity. See what actually happens in the 3 seconds between you hitting enter and reading the answer.” The room got quiet. Someone typed the question. First, the Simple RAG Pattern (So We Have a Baseline) If you’ve built any RAG system, you know the dance: Ingest — chunk documents, embed them, store in a vector DB Retrieve — embed the query, find the closest chunks Augment — inject those chunks into a prompt Generate — LLM answers, grounded in your docs That’s the 40-line Python version. Clean. Predictable. Works great when your docs are a handful of policy PDFs. But Perplexity isn’t doing that with 10 PDFs
Continue reading on Dev.to
Opens in a new tab



