
Entity Resolution at Scale: Matching Products Across Amazon, Reddit, and RTINGS
"AirPods Pro 2," "Apple AirPods Pro (2nd Generation)," "AirPods Pro USB-C" — same product, three different names. Entity resolution — figuring out that different strings refer to the same real-world thing — is one of the hardest problems in product data engineering. At SmartReview , we match products across 50+ review sources, each with its own naming conventions, categorization, and data formats. Here's how we solved it without spending six months building a custom ML model. The Problem Space Consider matching products across these sources: Source Product Name Format Amazon Apple AirPods Pro (2nd Generation) - MagSafe Case (USB-C) Full name + SKU details Reddit AirPods Pro 2 Colloquial shorthand RTINGS Apple AirPods Pro 2nd Gen Abbreviated formal YouTube NEW AirPods Pro 2 USB-C Review Title with marketing fluff Best Buy Apple - AirPods Pro 2 - White Brand-prefixed with color All five refer to the same product. A naive string match would treat them as five different products. Our Three
Continue reading on Dev.to
Opens in a new tab



