
5 Architectural Patterns for Building Scrapers That Never Break
I've published 77 free web scrapers and 15 MCP servers on Apify Store. Every one uses API-first methodology — JSON APIs, RSS feeds, JSON-LD, or open protocol APIs instead of fragile CSS selectors. Here are the most interesting architectural patterns I discovered: Pattern 1: Hidden JSON Endpoints Used in: Reddit, YouTube, most modern SPAs Most sites have internal JSON APIs their frontend calls. The URL patterns are discoverable through browser DevTools → Network tab → XHR/Fetch. Reddit: append .json . YouTube: Innertube API. These endpoints are stable because the site's own app depends on them. Pattern 2: RSS as a Scraping Shortcut Used in: Google News, blogs, podcasts, most CMS platforms RSS feeds return structured XML with title, link, date, description. One HTTP request = 10-50 items. No JavaScript rendering. Google News RSS is particularly powerful: search any keyword, get 10 latest articles with sources. Pattern 3: JSON-LD Structured Data Used in: Trustpilot, e-commerce, restaurant
Continue reading on Dev.to Tutorial
Opens in a new tab




