How I Built a Job Aggregator That Scrapes 80+ Sites Daily

Last year, job seekers in Azerbaijan had to check 10+ websites every morning. boss.az, hellojob.az, jobsearch.az, LinkedIn, plus dozens of company career pages. No one aggregated them. So I built BirJob — a scraper that pulls from 80+ sources into one searchable platform. Here's how it works under the hood. The Architecture GitHub Actions (cron, twice daily) ↓ 80+ Python scrapers (aiohttp + BeautifulSoup) ↓ PostgreSQL on Neon (dedup via md5 hash) ↓ Next.js 14 on Vercel (SSR + API routes) ↓ Users search / get alerts via Email + Telegram The Scraper System Each scraper extends a BaseScraper class: class BaseScraper : async def fetch_url_async ( self , url , session ): # aiohttp with retry logic, rate limiting # returns HTML string or JSON dict def save_to_db ( self , df ): # pandas DataFrame → PostgreSQL # ON CONFLICT (apply_link) DO UPDATE # dedup_hash = md5(company + title) Most sites are simple HTML — BeautifulSoup handles them. A few are SPAs (Next.js, React) that need Playwright. So

How I Built a Job Aggregator That Scrapes 80+ Sites Daily

Related Articles

Caller ID app Truecaller hits 500 million monthly users

Evercade’s new handheld has a larger screen and dual thumbsticks for 3D games

No Kings is taking back Americana

Social gaming platform Rec Room, once valued at $3.5B, is shutting down

MLA+MOE based model and T5 comparison who wins?

Related Articles

News
Caller ID app Truecaller hits 500 million monthly users
TechCrunch • 53m ago

News
Evercade’s new handheld has a larger screen and dual thumbsticks for 3D games
The Verge • 1h ago

News
No Kings is taking back Americana
The Verge • 1h ago

News
Social gaming platform Rec Room, once valued at $3.5B, is shutting down
TechCrunch • 1h ago

News
MLA+MOE based model and T5 comparison who wins?
Medium Programming • 1h ago