
I Built 77 Web Scrapers — Here Are the 10 Patterns That Actually Work
After building 77 scrapers, every problem is a variation of the same 10 patterns I've published 77 web scrapers on Apify Store . Reddit, Hacker News, Google News, Trustpilot, YouTube, Bluesky — you name it. Here are the 10 patterns I use in every single one. Pattern 1: Always use sessions # Bad: new connection every request for url in urls : requests . get ( url ) # TCP handshake every time # Good: reuse connection session = requests . Session () for url in urls : session . get ( url ) # Reuses TCP connection Impact: 2-5x faster for multiple requests to the same domain. Pattern 2: Exponential backoff on errors import time def fetch ( url , max_retries = 3 ): for i in range ( max_retries ): try : resp = session . get ( url , timeout = 10 ) if resp . status_code == 429 : time . sleep ( 2 ** i ) continue resp . raise_for_status () return resp except Exception : if i == max_retries - 1 : raise time . sleep ( 2 ** i ) Pattern 3: Extract data with CSS selectors, not XPath from bs4 import Bea
Continue reading on Dev.to Python
Opens in a new tab




