Three Things I Learned Using LLMs in a Data Pipeline

This is a submission for the Built with Google Gemini: Writing Challenge What I Built with Google Gemini "Ghibliotheque Presents: My Neighbor Totoro + Intro" That's a real cinema listing title, but it's not a title you can just search for. And as titles go, it's one of the more straightforward ones. Things get even messier when we get into cinema listing pages. I've seen venues that don't include a year, don't include the director, or give you little more than a title and a one-line description. If you're building an aggregator that needs to identify what's actually showing, you spend a lot of time staring at strings like this. I've been building Clusterflick , a cinema aggregator for London that pulls listings from 250+ venues daily. I thought scraping would be the hard part. But figuring out what a listing actually is — which film, matched to which entry in The Movie DB — is where a lot of complexity lies. And it's where I've been using Gemini. There's a whole layer of work involved

Three Things I Learned Using LLMs in a Data Pipeline

Related Articles

7 Coding Habits That Will Improve Your Skills

A Multi-Agent Code for Trading with Prompts

Algorithms I Finally Understood — Part 1: Why Algorithms Exist (Before We Even Write Code)

Building a Real-Time Customer Support System in .NET

Apple iPhone 17e: Specs, Features, Release Date, Price

Related Articles

How-To
7 Coding Habits That Will Improve Your Skills
Medium Programming • 9h ago

How-To
A Multi-Agent Code for Trading with Prompts
Medium Programming • 11h ago

How-To
Algorithms I Finally Understood — Part 1: Why Algorithms Exist (Before We Even Write Code)
Medium Programming • 12h ago

How-To
Building a Real-Time Customer Support System in .NET
Medium Programming • 12h ago

How-To
Apple iPhone 17e: Specs, Features, Release Date, Price
Wired • 13h ago