
How I built a scraper that actually works on Cloudflare sites
I was building a research agent. It needed to read news sites, pull earnings reports, scrape job listings. Three hours in, half my URLs were returning empty strings or Cloudflare challenge pages. Not errors. Just nothing useful. That is when I realized the scraping ecosystem is mostly broken for anything that is not a static blog. Why scraping keeps failing There are three things killing most scrapers right now. JavaScript rendering. A lot of sites ship an empty HTML shell and hydrate via React or Vue. Fetch the URL directly and you get a div with an id and nothing else. Bot detection. Cloudflare, PerimeterX, DataDome -- they fingerprint your browser. Missing plugins, wrong screen resolution, suspiciously perfect mouse timing. A vanilla Playwright script fails all of these in about 30 seconds. IP reputation. Datacenter IPs are flagged before your code even runs. AWS, Hetzner, DigitalOcean -- blocked by default on half the sites worth scraping. You can fight each of these individually.
Continue reading on Dev.to Python
Opens in a new tab



