Why Your Database Hates COUNT(DISTINCT) and Why HyperLogLog is the Cure

TL;DR: HyperLogLog (HLL) is a probabilistic data structure that estimates unique counts by analyzing the bit patterns of hashed IDs. Instead of storing every user ID, it tracks the maximum number of leading zeros in hashed values, allowing you to estimate billions of unique views using about 12KB of memory with ~2% error. Scaling unique view counts is a silent database killer. If you try to track every user_id for every post on a platform with millions of users, your infrastructure costs will eventually eclipse the value of the feature itself. You're effectively burning RAM to show a number on a UI that doesn't even need to be 100% precise. I’ve seen plenty of teams try the naive route: a dedicated table of user IDs and a big COUNT(DISTINCT) query. At a certain scale, that stops being a query and starts being a resource exhaustion event. If you want to count millions of unique views across millions of posts without your database screaming for mercy, you have to stop storing data and st

Why Your Database Hates COUNT(DISTINCT) and Why HyperLogLog is the Cure

Related Articles

Robinhood is making a social network

Stop Guessing: A Simple System to Solve Any Coding Problem

Best early Amazon Spring Sale robot vacuum deals 2026

Kasa’s Matter-compatible smart plugs are on sale for $11 a pop

Consistent Hashing for Sharding and Sticky Routing in Spring Boot

Related Articles

News
Robinhood is making a social network
The Verge • 37m ago

News
Stop Guessing: A Simple System to Solve Any Coding Problem
Medium Programming • 1h ago

News
Best early Amazon Spring Sale robot vacuum deals 2026
ZDNet • 1h ago

News
Kasa’s Matter-compatible smart plugs are on sale for $11 a pop
The Verge • 1h ago

News
Consistent Hashing for Sharding and Sticky Routing in Spring Boot
Medium Programming • 1h ago