
Top-K queries with MongoDB search indexes (BM25)
A document database is more than a JSON datastore. It must also support efficient storage and advanced search: equality and range predicates, fuzzy text search, ranking, pagination, and limited sorted results (top‑k). BM25 indexes, which combine an inverted index and columnar doc values, are ideal for this, with mature open‑source implementations like Lucene (used by MongoDB) and Tantivy (used by ParadeDB). ParadeDB brings Tantivy indexing to PostgreSQL via the pg_search extension and recently published an excellent article showing where GIN indexes fall short and how BM25 bridges the gap. Here, I’ll present the MongoDB equivalent using its Lucene‑based search indexes. I suggest reading ParadeDB’s post first, as it clearly explains the problem and the solution: How We Optimized Top K in Postgres | ParadeDB How ParadeDB uses principles from search engines to optimize Postgres' Top K performance. paradedb.com I'll be lazy and use the same dataset, index and query. MongoDB with search ind
Continue reading on Dev.to
Opens in a new tab



