
Cleaning Cinema Titles Before You Can Even Search
When Clusterflick first started pulling listings, I assumed the hard part would be the scraping. Getting the data off 250+ different cinema websites, each with their own structure and quirks — that's where the complexity lives, right? But before any of that work pays off, before a single TMDB search can happen, there's a problem sitting right at the start of the pipeline: cinema listings don't always give you a clean film title. They give you something like this: BAR TRASH – THE ZODIAC KILLER (1971) at Beer Merchants Tap Or: (IMAX) Princess Mononoke: 2025 Re-Release Subtited Or my personal favourite: MUPPET PUPPETS CHRISTMAS CAROL WORKSHOP & SING-ALONG None of those are going to find anything useful in a TMDB search. So before matching can happen, there's a normalisation step — and it's grown into something with its own test suite of nearly 15,000 cases. The Obvious Stuff The easy wins are the patterns you see immediately once you start looking at real listings. Film Clubs will attach
Continue reading on Dev.to JavaScript
Opens in a new tab



