
I Stopped Letting Emails Poison My Extractor: The Pre-LLM Gate That Made the Rest of the Pipeline Reliable
I knew something was wrong the first time I saw a “candidate” come back with the recruiter’s phone number. Nothing was broken in the obvious places. Extraction ran. Persistence succeeded. The UI showed a clean-looking result. But the identity was wrong. That moment is what this series is really about. This is Part 1 of How to Architect an Enterprise AI System (And Why the Engineer Still Matters) . In Part 0— “The Day My AI Forgot Everything (So I Built a Context-Continuity Inference Stack)” —I argued the thesis: models raise the floor; architecture is still the ceiling. Here’s the first concrete decision that proved it in production: I stopped designing my extraction pipeline for clean input—and started designing it for adversarial input. Not adversarial like “attackers.” Adversarial like real email: forwarded threads with duplicated headers signature blocks with phone numbers that look more “extractable” than the actual subject’s HTML bodies full of invisible control characters and we
Continue reading on Dev.to Python
Opens in a new tab


