The OSS ER Bargain: What Entity Resolution Actually Costs You
The OSS ER Bargain: What Entity Resolution Actually Costs You Benchmarking dedupe vs GoldenMatch on 500,000 CMS provider records The National Plan and Provider Enumeration System (NPPES) publishes one of the largest open healthcare directories in the world: 6+ million U.S. providers, updated monthly, with names spelled four different ways, addresses that drift across quarters, and enough Smiths and Garcias to keep any blocking algorithm honest. It's a reasonable stand-in for the kind of data most organizations actually have: real, messy, and big enough to hurt. I wanted to see what it costs to resolve a dataset like this with traditional open-source entity resolution, versus a holistic approach. So I took 500,000 randomly-sampled records from the March 2026 NPPES release and pointed two tools at them: dedupe , the canonical Python OSS deduper, and GoldenMatch , the matching engine at the heart of the Golden Suite. This isn't a precision/recall bake-off. NPPES ships no ground-truth dupl
Continue reading on Dev.to
Opens in a new tab
