
We ran 109 tests to measure how PII protection methods affect LLM output quality. Here's what we learned and what we built.
**TL;DR: **We ran 109 tests across GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro to measure how different PII protection methods affect LLM output quality. Placeholder masking ([PERSON], [SSN]) dropped output quality to 54-68%. Deterministic tokenization (each entity gets its own unique opaque token) preserved 91-96%. We also found that leaving PII labels like "SSN" next to tokenized values causes safety refusals in 15-20% of cases. We built NoPII based on these findings: a reverse proxy that tokenizes PII before prompts reach the model and detokenizes responses on the way back. One base_url change in your existing SDK. Free tier, no credit card. Full paper here: Link If you are building anything on top of LLM APIs that touches real user data, you have probably had the conversation. The one where the prototype works, the team is excited, and then someone from security or legal asks what exactly is being sent to OpenAI or Anthropic or whichever provider you are using. That question tend
Continue reading on Dev.to
Opens in a new tab


