
I Tried Vector Search on Molecules. Here Is What Actually Happened.
In this article, I'll walk you through how I built a robust molecular similarity search system using ChemBERTa, RDKit, Qdrant, and what I actually learned along the way. Canonical URL: https://medium.com/towards-artificial-intelligence/i-tried-vector-search-on-molecules-heres-what-happened-7391b755efe4 TL;DR I wanted to see if vector search could work on molecules the same way it works on text. It can. I used ChemBERTa to embed SMILES strings into 768-dim vectors, indexed them in Qdrant with molecular property metadata, and ran similarity search with payload filters applied during retrieval. The system surfaced structurally similar candidates that fingerprint-based search missed. This post walks through every step, including where it breaks. GitHub: github.com/dvy246/qdrant Stack: Python 3.10+, RDKit, ChemBERTa, Qdrant, FastAPI, Streamlit Why I Built This I had been spending a lot of time with vector databases and embedding-based search. Every example I came across was about text: docu
Continue reading on Dev.to Tutorial
Opens in a new tab
.jpg&w=1200&q=75)



