
How to Protect PII in LLM Pipelines with Python
Tokenize personal data before it reaches the model, restore it in the output. If we're building AI features that handle customer data — support tickets, medical intake, financial queries — we have a problem. Every prompt we send to an LLM API is logged, cached, and potentially used for training. Names, emails, SSNs, medical records: all of it lands on someone else's servers. GDPR says we can't send EU personal data to third-party processors without safeguards. HIPAA says protected health information must be de-identified. And even outside regulated industries, sending raw customer data to OpenAI or Anthropic is a liability we shouldn't accept. Here's what a naive implementation looks like: from openai import OpenAI client = OpenAI () # Every name, email, and SSN in this text hits OpenAI's servers response = client . chat . completions . create ( model = " gpt-4o-mini " , messages = [{ " role " : " user " , " content " : ( " Summarize this support ticket: Customer John Doe (john@acme.co
Continue reading on Dev.to Python
Opens in a new tab




