
I Built 77 Web Scrapers — Here Are the 5 Patterns That Never Break
After building 77 production web scrapers, I've learned that most scrapers break within weeks . But a few patterns make them nearly indestructible. Pattern 1: API-First, HTML-Last Before writing a single CSS selector, check if the site has a JSON API. // Instead of this (breaks on redesign): const title = $ ( " h1.product-title " ). text (); // Do this (works forever): const data = await fetch ( " https://site.com/api/products/123 " ); const { title } = await data . json (); Examples of hidden APIs: YouTube: youtubei/v1/search (Innertube API) Reddit: append .json to any URL Hacker News: hn.algolia.com/api/v1/search Pattern 2: Use Official Public APIs First 9 APIs that need NO authentication: API What You Get Wikipedia Market overviews, article snippets Google News RSS Latest 100 news articles GitHub Search Repos, stars, tech landscape HN Algolia Community discussions, points Stack Overflow Developer questions, votes arXiv Academic papers, abstracts npm Registry Package ecosystem data R
Continue reading on Dev.to Tutorial
Opens in a new tab



