
Three Things I Learned Using LLMs in a Data Pipeline
This is a submission for the Built with Google Gemini: Writing Challenge What I Built with Google Gemini "Ghibliotheque Presents: My Neighbor Totoro + Intro" That's a real cinema listing title, but it's not a title you can just search for. And as titles go, it's one of the more straightforward ones. Things get even messier when we get into cinema listing pages. I've seen venues that don't include a year, don't include the director, or give you little more than a title and a one-line description. If you're building an aggregator that needs to identify what's actually showing, you spend a lot of time staring at strings like this. I've been building Clusterflick , a cinema aggregator for London that pulls listings from 250+ venues daily. I thought scraping would be the hard part. But figuring out what a listing actually is — which film, matched to which entry in The Movie DB — is where a lot of complexity lies. And it's where I've been using Gemini. There's a whole layer of work involved
Continue reading on Dev.to
Opens in a new tab


