How to Build a Production-Ready PII Scrubber (No ML Required)

TL;DR Built a privacy-first PII detector that finds and scrubs 11 types of sensitive data (emails, SSNs, credit cards, API keys, etc.) in pure Python regex. No spaCy. No transformers. <20ms per request. Full test suite included. What You Need To Know The problem : Every AI request leaks metadata. Prompts are logged, stored, analyzed. Users send sensitive data to OpenAI/Claude without thinking about privacy. The solution : A privacy proxy that scrubs PII before forwarding to LLM providers. This article : How to build the scrubber (Phase 1). Supported PII : EMAIL, PHONE, SSN, CREDIT_CARD, API_KEY_*, IPV4, IPV6, URL_WITH_TOKENS Performance : 14ms per 1000 chars. No external ML models. Tests : 21/21 passing (unit + integration). Why Pattern Matching Beats NLP Most engineers reach for spaCy NER when they think "PII detection." But for production: spaCy needs model downloads — 100MB+ first load Named entity recognition is slow — 100-500ms per request It hallucinates — catches things that are

How to Build a Production-Ready PII Scrubber (No ML Required)

Related Articles

Vibe Coding: When Software Became A Conversation, Not Code

How I Won the MTD Marathon 2026 — Building a Personal Diary App in Just 4 Hours

Why Engineering Managers Should Challenge Product Assumptions Early

PopSockets founder David Barnett talks about building a viral business

Your App Is Slow. Your Cache Is the Problem.

Related Articles

How-To
Vibe Coding: When Software Became A Conversation, Not Code
Medium Programming • 4h ago

How-To
How I Won the MTD Marathon 2026 — Building a Personal Diary App in Just 4 Hours
Medium Programming • 7h ago

How-To
Why Engineering Managers Should Challenge Product Assumptions Early
Medium Programming • 7h ago

How-To
PopSockets founder David Barnett talks about building a viral business
TechCrunch • 8h ago

How-To
Your App Is Slow. Your Cache Is the Problem.
Medium Programming • 8h ago