
Scraped 1200 products. 87 had prices like '$19.99extra'.
Scraped 1200 products. 87 had prices like '$19.99extra'. Got hired to scrape competitor pricing for an ecommerce client. Grabbed product names, prices, availability from 3 different sites. Ran the script overnight. Woke up to a nice CSV with 1200 rows. Felt good. Client downloads it. Five minutes later I get a message: "Your prices are broken." Fun times. Turns out HTML is a mess Checked the CSV. Most prices looked fine: $19.99, $45.00, $129.95. But scattered throughout were absolute gems: $19.99extra $45.00– Price: $129.95 $89 (where are the decimals) FREE (not even a number) The sites had wildly different HTML. One put "extra savings" text inside the same span as the price. Another stuck a dash after clearance prices for no reason. Third one prefixed everything with "Price:" like it wasn't obvious already. My scraper just grabbed .innerText and called it a day. Zero cleaning. Bad call. Tried the obvious fix first Figured I'd just strip the junk: price = price . replace ( ' extra ' ,
Continue reading on Dev.to Tutorial
Opens in a new tab



