
Next.js 14 cron scraping: rate limits + retries
Run a daily scraper on Vercel. Without melting sources. Enforce per-host rate limits in Node. Not “sleep(1000)”. Make retries idempotent with Postgres locks. Store failures so I can re-run only the broken ones. Context I’m building a job board for Psychiatric Mental Health Nurse Practitioners. 8,000+ active listings. 2,000+ companies. The pipeline scrapes 200+ jobs daily from multiple sources. Some are nice JSON feeds. Most aren’t. My first version was dumb. One cron. One loop. Fetch everything. It worked. Until it didn’t. 429s. Random 403s. Timeouts. Worse — half a run would succeed, then retries would duplicate work and waste time. This post is how I stabilized it. Rate limiting by host. Jitter. Backoff. And a Postgres lock so reruns don’t stomp each other. 1) I stopped using “one cron to rule them all” I used to do this: “Cron hits /api/scrape and that endpoint scrapes everything.” Brutal. One slow host makes the whole run slow. And Vercel timeouts become your scheduler. Now I split
Continue reading on Dev.to Webdev
Opens in a new tab



