
Web Scraping Cheat Sheet: Every Tool, API, and Pattern in One Place
Bookmark this. Everything you need for web scraping in one article. Tools Tool Use Case Install Cheerio HTML parsing npm i cheerio Playwright Browser automation npm i playwright xml2js XML/RSS parsing npm i xml2js xlsx Excel output npm i xlsx Free APIs (No Key) API URL Pattern Reddit reddit.com/r/SUB.json YouTube youtubei/v1/search Shopify store.com/products.json HN hn.algolia.com/api/v1/search Wikipedia en.wikipedia.org/w/api.php arXiv export.arxiv.org/api/query npm registry.npmjs.org/-/v1/search DuckDuckGo api.duckduckgo.com/?q=X&format=json Bluesky public.api.bsky.app/xrpc/ Anti-Bot Checklist [ ] Set User-Agent header [ ] Add random delays (2-5s) [ ] Rotate user agents [ ] Handle 429 with exponential backoff [ ] Use Promise.allSettled for parallel [ ] Validate output data Output Formats // JSON fs . writeFileSync ( " out.json " , JSON . stringify ( data , null , 2 )); // CSV const csv = data . map ( d => Object . values ( d ). join ( " , " )). join ( " \n " ); fs . writeFileSync ( "
Continue reading on Dev.to Tutorial
Opens in a new tab



