Back to articles
I Ran 60 Autoresearch Experiments on a Production Search Algorithm. Here's What Actually Happened.

I Ran 60 Autoresearch Experiments on a Production Search Algorithm. Here's What Actually Happened.

via Dev.to PythonPJ Hoberman

Everyone's writing about Karpathy's autoresearch . Most of it is "here's how the loop works" or "imagine the possibilities." I wanted to see what happens when you point it at a real codebase with a real metric, not a training script. I wanted to try it! So I ran two rounds. 60 total iterations. The first round improved things. The second round found nothing - and that turned out to be even more interesting. The System I work on a hybrid search system: Cohere embeddings in pgvector for semantic similarity, then a keyword re-ranking layer on top. Django, PostgreSQL, Bedrock. The kind of search stack a lot of teams are probably running right now. The ranking logic lives in one file: utils.py . It takes the top 100 vector search candidates, scores them on keyword and tag matches across location, activity, and general terms, normalizes everything with z-scores, applies adaptive correlation-based weighting to avoid double-counting, and combines it all into a final score: similarity * (1 + ke

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
4 views

Related Articles