
Building CDDBS — Part 2: Inside the Analysis Pipeline
The Pipeline Problem Most LLM tutorials show you how to call an API and print the response. Real systems need more. You need to fetch data from external sources, construct prompts that constrain the output format, parse responses that don't always follow your instructions, persist results to a database, and handle every failure mode gracefully — all without blocking the user. CDDBS solves this with a 6-stage background pipeline. This post walks through every stage with actual code from the production system. Stage 1: Article Fetch When a user requests an analysis of a media outlet, the first thing we need is content to analyze. CDDBS uses SerpAPI's Google News engine to fetch recent articles. # src/cddbs/pipeline/fetch.py (simplified) # Map short date_filter codes to Google News 'when:' query values _WHEN_MAP = { " h " : " 1h " , " d " : " 1d " , " w " : " 7d " , " m " : " 30d " , " y " : " 1y " , } def fetch_articles ( outlet , country , num_articles = 3 , url = None , api_key = None
Continue reading on Dev.to Python
Opens in a new tab



