
jemalloc vs malloc vs tcmalloc: Why Your Server's Default Allocator Is Killing P99 Latency
jemalloc vs malloc vs tcmalloc: Why Your Server's Default Allocator Is Killing P99 Latency A few months ago, I was chasing a P99 latency spike on a multi-threaded service handling roughly 40,000 requests per second. The flame graphs pointed at an unusual suspect: malloc . Not a slow database query. Not a network timeout. The standard glibc memory allocator was holding a global lock, and threads were lining up behind it like cars at a single-lane toll booth. I swapped in jemalloc with a single LD_PRELOAD change. P99 dropped 35%. No code changes, no architecture redesign. Just a better allocator. This is one of those things where the boring answer is actually the right one. Most engineers never think about their memory allocator. They shouldn't have to. But if you're running multi-threaded server workloads at any real scale, the default allocator is leaving performance on the table. The Problem With glibc malloc glibc's malloc implementation (based on ptmalloc2) was designed when "multi-
Continue reading on Dev.to
Opens in a new tab



