
How I Built a Soccer Coach Contact Extractor for Messy Athletics Websites
Most athletics websites look simple until you try to extract structured data from them at scale. Coach pages are especially messy. One school gives you a clean staff directory with mailto: links. Another hides emails behind Cloudflare. Another puts names on the roster page and the actual contact info on a separate bio page. Another sends back an empty shell and expects JavaScript to do the rest. That is what this project solves. football-soccer-emails is a TypeScript-based extractor that pulls soccer and football coach contact information from athletics websites and turns it into structured records. It supports direct URLs, public Google Sheets, and an Apify workflow for batch runs. The reason I built it this way is simple: I tried a version of this problem around 2017 or 2018 using heuristics only, and it was roughly 40% accurate. That was about as far as rules alone would take me. With LLMs and a multi-stage extraction flow, this same class of problem can now get into the 90%+ range.
Continue reading on Dev.to
Opens in a new tab



