
How I Built a Job Aggregator That Scrapes 80+ Sites Daily
Last year, job seekers in Azerbaijan had to check 10+ websites every morning. boss.az, hellojob.az, jobsearch.az, LinkedIn, plus dozens of company career pages. No one aggregated them. So I built BirJob — a scraper that pulls from 80+ sources into one searchable platform. Here's how it works under the hood. The Architecture GitHub Actions (cron, twice daily) ↓ 80+ Python scrapers (aiohttp + BeautifulSoup) ↓ PostgreSQL on Neon (dedup via md5 hash) ↓ Next.js 14 on Vercel (SSR + API routes) ↓ Users search / get alerts via Email + Telegram The Scraper System Each scraper extends a BaseScraper class: class BaseScraper : async def fetch_url_async ( self , url , session ): # aiohttp with retry logic, rate limiting # returns HTML string or JSON dict def save_to_db ( self , df ): # pandas DataFrame → PostgreSQL # ON CONFLICT (apply_link) DO UPDATE # dedup_hash = md5(company + title) Most sites are simple HTML — BeautifulSoup handles them. A few are SPAs (Next.js, React) that need Playwright. So
Continue reading on Dev.to Python
Opens in a new tab




