
Grants to Investments Part 2-3: Models and Pipelines
🚀 Grants ETL Pipeline — Rust + Transformer-Based Classification 📌 Overview I built an end-to-end ETL pipeline to ingest, classify, and analyze Canadian government grant data. The project combines: ⚡ High-performance data extraction using Rust 🧠Semantic classification using BERT (zero-shot) 📊 Structured output ready for downstream analytics and dashboarding This project demonstrates systems design, data engineering, and applied NLP in a production-style pipeline. 🧩 Extraction Layer (Rust) The Problem The Grants Canada portal has no accessible API — only an HTML-rendered search interface. I needed a way to extract structured data at scale. The Solution I built a custom scraper targeting the paginated search endpoint: https://search.open.canada.ca/grants/?page={}&sort=agreement_start_date+desc Key Decisions I initially started with Python but switched to Rust for performance at scale. The Rust scraper uses: scraper — for HTML parsing csv — for structured output Designed to handle large-s
Continue reading on Dev.to
Opens in a new tab
