
How We Query 16.8M SIRENE Establishments in 66ms
The French SIRENE database contains information about every registered business in France — over 30 million establishments. We imported 16.8 million active ones into GEOREFER and needed to make them searchable by name in under 100ms. Here's how we did it with PostgreSQL 16 and pg_trgm . The Challenge Our establishment table has 16.8 million rows. Users need to search by: SIREN (9 digits) — exact match, trivial with a B-tree index SIRET (14 digits) — exact match, same Company name — fuzzy match, this is where it gets interesting The name search needs to handle: Partial matches: "Total" should find "TotalEnergies SE" Typos: "Miclein" should find "Michelin" Accent insensitivity: "Societe Generale" should match "Societe Generale" The Naive Approach: ILIKE First attempt: SELECT * FROM georefer . establishment WHERE company_name ILIKE '%total%' LIMIT 25 ; EXPLAIN ANALYZE result: Seq Scan on establishment Filter: (company_name ~~* '%total%') Rows Removed by Filter: 16799975 Planning Time: 0.1
Continue reading on Dev.to
Opens in a new tab



