
Why Your Profanity Filter Fails Against Unicode (And How to Fix It)
Most profanity filters only check raw input. That’s the problem. You can block fuck . But what about: fu\u0441k (Cyrillic “с” instead of Latin “c”) fuck (fullwidth Unicode characters) f.u.c.k (separator bypass) Fr33 m0ney (leet-speak) fuuuuck (character stretching) They all bypass typical word-list filters. The issue isn’t your regex. It’s the order of operations . Normalize First. Validate Second. Before checking profanity or spam, input should be normalized: Unicode NFKC normalization Zero-width character removal Separator stripping Homoglyph mapping Leet-speak normalization Repetition reduction After normalization, all evasions collapse into a canonical form. Then your profanity/spam logic actually works. What I Built I created @marslanmustafa /input-shield — a zero-dependency TypeScript validation package that: Detects Unicode homoglyph attacks Catches leet-based spam Blocks stretched profanity Detects gibberish (e.g. asdfghjkl) Supports Zod integration Validates HTML email content
Continue reading on Dev.to Webdev
Opens in a new tab




