
Feed Rescue: Converting Raw Ulta Scrapes into Google Merchant Center XML
You’ve bypassed the anti-bot shields, rotated your proxies, and extracted thousands of product records from Ulta.com. Your reward is a massive JSONL file sitting on your hard drive. While this is a victory for data extraction, it’s a dead end for a marketing team. Ad platforms like Google Merchant Center (GMC) don't accept JSONL. They require a highly structured, strictly validated XML format (RSS 2.0 or Atom). If your data doesn't perfectly match their schema—from currency codes to specific availability enums—your products won't show up in Google Shopping. This "Feed Rescue" phase involves taking the raw output from the Ulta.com-Scrapers repository and building a Python transformation pipeline to generate a production-ready Google Shopping feed. Phase 1: Analyzing the Source Data Before writing any XML, we need to look at the raw material. The scrapers in the Ulta.com-Scrapers repository, specifically the Selenium and Playwright versions, use a ScrapedData dataclass that outputs a con
Continue reading on Dev.to Python
Opens in a new tab



