Back to articles
How One Field in a Sort Query Brought Down Our OpenSearch Cluster
NewsDevOps

How One Field in a Sort Query Brought Down Our OpenSearch Cluster

via Dev.toJoel Dsouza

TL;DR We added _id as a sort tie breaker to fix non deterministic pagination _id is a metadata field with no doc values. Sorting on it loads everything into JVM heap via fielddata At scale, this caused JVM to spike to 98–99%, triggered circuit breakers, and flooded us with 429 errors The fix: use a properly mapped keyword field with doc values instead Lesson: know what lives on heap and what doesn't before you sort on it Picture This You've just deployed what looks like a routine fix. A two line change. A sort tie breaker to handle non deterministic pagination. Nothing that would raise an eyebrow in code review. Minutes after the deploy, your monitoring lights up. JVM pressure is spiking. Errors are flooding in. Your OpenSearch cluster, which was perfectly healthy moments ago, is struggling to stay alive. You didn't change your data. You didn't change your infrastructure. You added one field to a sort query. That's exactly what happened to us. The Change We had a query sorting results

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles