
Why Your Database Hates COUNT(DISTINCT) and Why HyperLogLog is the Cure
TL;DR: HyperLogLog (HLL) is a probabilistic data structure that estimates unique counts by analyzing the bit patterns of hashed IDs. Instead of storing every user ID, it tracks the maximum number of leading zeros in hashed values, allowing you to estimate billions of unique views using about 12KB of memory with ~2% error. Scaling unique view counts is a silent database killer. If you try to track every user_id for every post on a platform with millions of users, your infrastructure costs will eventually eclipse the value of the feature itself. You're effectively burning RAM to show a number on a UI that doesn't even need to be 100% precise. I’ve seen plenty of teams try the naive route: a dedicated table of user IDs and a big COUNT(DISTINCT) query. At a certain scale, that stops being a query and starts being a resource exhaustion event. If you want to count millions of unique views across millions of posts without your database screaming for mercy, you have to stop storing data and st
Continue reading on Dev.to
Opens in a new tab



