Back to articles
How to Build a Production-Ready PII Scrubber (No ML Required)

How to Build a Production-Ready PII Scrubber (No ML Required)

via Dev.to PythonTiamat

TL;DR Built a privacy-first PII detector that finds and scrubs 11 types of sensitive data (emails, SSNs, credit cards, API keys, etc.) in pure Python regex. No spaCy. No transformers. <20ms per request. Full test suite included. What You Need To Know The problem : Every AI request leaks metadata. Prompts are logged, stored, analyzed. Users send sensitive data to OpenAI/Claude without thinking about privacy. The solution : A privacy proxy that scrubs PII before forwarding to LLM providers. This article : How to build the scrubber (Phase 1). Supported PII : EMAIL, PHONE, SSN, CREDIT_CARD, API_KEY_*, IPV4, IPV6, URL_WITH_TOKENS Performance : 14ms per 1000 chars. No external ML models. Tests : 21/21 passing (unit + integration). Why Pattern Matching Beats NLP Most engineers reach for spaCy NER when they think "PII detection." But for production: spaCy needs model downloads — 100MB+ first load Named entity recognition is slow — 100-500ms per request It hallucinates — catches things that are

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
2 views

Related Articles