
How I Built a Two-Level Cache to Serve Millions of Lookups in Under a Millisecond
Every high-traffic system eventually hits the same wall: your data store can't keep up. For us, the breaking point came when a simple product lookup — backed by Elasticsearch — started showing tail latencies creeping past 80ms. At scale, that's the kind of number that keeps you up at night. The solution wasn't a faster cluster. It was rethinking where data lives before it ever reaches Elasticsearch at all. This post walks through the two-level caching strategy we built using Caffeine as an in-process L1 cache and Redis as a distributed L2 cache, with Elasticsearch sitting behind as the source of truth. The Problem with Single-Layer Caching Most teams reach for Redis the moment they need a cache. It's fast, it's familiar, and it works. But Redis still lives over the network. Even on a low-latency internal network, you're paying 1–5ms per hop. Do that a few thousand times per second across many services, and it adds up. The other option — caching inside the application process using some
Continue reading on Dev.to
Opens in a new tab



