
I built a managed pgvector service and here's what I learned about vector search performance
I Ran pgvector on NVMe vs Cloud SSD. The Difference Shocked Me. 2,000 queries per second at under 4ms. That's what I'm getting on a $35/month server. Let me tell you how I got there and what I had to build to make it work. The problem started with a side project I was building a RAG pipeline. Standard stuff: OpenAI embeddings, PostgreSQL, pgvector extension, HNSW index. Everything worked fine in development. Then I moved it to a managed database on a regular cloud provider and watched my query latency go from 4ms to 47ms under any real load. I spent two days thinking my index was wrong. Wrong ef_search value. Wrong m parameter. Wrong dimension count. None of that was it. The problem was the disk. HNSW does not behave like a normal database query Most database queries are sequential reads. Your disk reads a chunk of data in order, hands it back, done. SSDs are fast at this. Even gp3 cloud SSDs are fast at this. HNSW is different. An HNSW index traversal is essentially a graph walk. You
Continue reading on Dev.to Webdev
Opens in a new tab




