How One Field in a Sort Query Brought Down Our OpenSearch Cluster

TL;DR We added _id as a sort tie breaker to fix non deterministic pagination _id is a metadata field with no doc values. Sorting on it loads everything into JVM heap via fielddata At scale, this caused JVM to spike to 98–99%, triggered circuit breakers, and flooded us with 429 errors The fix: use a properly mapped keyword field with doc values instead Lesson: know what lives on heap and what doesn't before you sort on it Picture This You've just deployed what looks like a routine fix. A two line change. A sort tie breaker to handle non deterministic pagination. Nothing that would raise an eyebrow in code review. Minutes after the deploy, your monitoring lights up. JVM pressure is spiking. Errors are flooding in. Your OpenSearch cluster, which was perfectly healthy moments ago, is struggling to stay alive. You didn't change your data. You didn't change your infrastructure. You added one field to a sort query. That's exactly what happened to us. The Change We had a query sorting results

How One Field in a Sort Query Brought Down Our OpenSearch Cluster

Related Articles

Netflix’s Secret to Safe Automation at Scale • Aubrey Chipman & Roberto Perez Alcolea

Repository Pattern with Hygienic Macros in Scheme – Lisp

ELF & Dynamic Linking

Protecting Cookies with Device Bound Session Credentials

Total.js RCE gadgets all around

Related Articles

News
Netflix’s Secret to Safe Automation at Scale • Aubrey Chipman & Roberto Perez Alcolea
Reddit Programming • 1h ago

News
Repository Pattern with Hygienic Macros in Scheme – Lisp
Lobsters • 4h ago

News
ELF & Dynamic Linking
Lobsters • 5h ago

News
Protecting Cookies with Device Bound Session Credentials
Lobsters • 5h ago

News
Total.js RCE gadgets all around
Lobsters • 5h ago