
Why Most Web Scrapers Break (And the 4-Tier Fix)
Your scraper worked perfectly for 3 months. Then one morning, it returns empty data. The target site changed their HTML. This happens because CSS selectors are fragile by design . They depend on class names, element hierarchy, and HTML structure — all of which change during routine redesigns. After maintaining 77 production scrapers, here's what actually works long-term. The 4 Reliability Tiers Tier 1: Public JSON APIs (99.9% uptime) Sites like Reddit, YouTube, and Hacker News expose JSON endpoints. These are stable because they're used by the site's own mobile app. Tier 2: RSS Feeds (99% uptime) Google News, blogs, podcasts — RSS is a standard that hasn't changed in 20 years. Tier 3: JSON-LD Structured Data (95% uptime) Embedded in HTML for Google's search results. Follows Schema.org standards. Changes are rare and backwards-compatible. Tier 4: CSS Selectors (70-90% uptime) The traditional approach. Breaks on every redesign. Should be your last resort. Real Examples from My 77 Scraper
Continue reading on Dev.to Tutorial
Opens in a new tab



