
When Regex Meets the DOM (And Suddenly It’s Not Simple Anymore)
I recently built a custom in-page “Ctrl + F”-style search and highlight feature. The goal sounded simple: Support multi-word queries Prefer full phrase matches Fall back to individual token matches Highlight results in the DOM Skip <code> and <pre> blocks In my head? “Easy. Just build a regex.” Step 1: Build the Regex If a user searches: power shell I generate a pattern like: power [ \ s \ u00A0 ] + shell | power | shell The logic: Try to match the full phrase first If that fails, match individual tokens On paper? Clean. In isolation? Works. Step 2: Enter the DOM This is where things escalated. Instead of just running string.match() , I had to: Walk the DOM Avoid header UI Avoid <pre> , <code> , <script> , <style> Avoid breaking syntax highlighting Replace only text nodes Preserve structure That meant using a TreeWalker . const walker = document . createTreeWalker ( root , NodeFilter . SHOW_TEXT , { acceptNode ( node ) { const p = node . parentElement ; if ( ! p ) return NodeFilter . F
Continue reading on Dev.to JavaScript
Opens in a new tab




