
Searchable JSON compression: page-level random access + ms lookups (and smaller than Zstd on our dataset)
Searchable JSON compression with page-level random access (and smaller than Zstd on our dataset) Most JSON compression stories end at “make it smaller.” But in real systems, the bigger cost is often decompress + parse + scan — repeatedly. I built SEE (Semantic Entropy Encoding) : a searchable compression format for JSON/NDJSON that keeps data queryable while compressed , with page-level random access . On our dataset, SEE is smaller than Zstd and supports fast lookups (details + proof below). Why this matters: the hidden “decompress+parse tax” If you store NDJSON as zstd , most queries still pay: read large chunks decompress everything parse JSON scan for the field/value you need Even if the data is small, the CPU + I/O pattern is brutal at scale. SEE targets workloads where you repeatedly need: exists / pos / eq -style queries random access low latency without full decompression What SEE is (in 60 seconds) SEE is a page-based , schema-aware format: page-level layout for random access
Continue reading on Dev.to
Opens in a new tab


