I Ran 60 Autoresearch Experiments on a Production Search Algorithm. Here's What Actually Happened.

Everyone's writing about Karpathy's autoresearch . Most of it is "here's how the loop works" or "imagine the possibilities." I wanted to see what happens when you point it at a real codebase with a real metric, not a training script. I wanted to try it! So I ran two rounds. 60 total iterations. The first round improved things. The second round found nothing - and that turned out to be even more interesting. The System I work on a hybrid search system: Cohere embeddings in pgvector for semantic similarity, then a keyword re-ranking layer on top. Django, PostgreSQL, Bedrock. The kind of search stack a lot of teams are probably running right now. The ranking logic lives in one file: utils.py . It takes the top 100 vector search candidates, scores them on keyword and tag matches across location, activity, and general terms, normalizes everything with z-scores, applies adaptive correlation-based weighting to avoid double-counting, and combines it all into a final score: similarity * (1 + ke

I Ran 60 Autoresearch Experiments on a Production Search Algorithm. Here's What Actually Happened.

Related Articles

A mission NASA might kill is still returning fascinating science from Jupiter

Trump's MAHA pick for surgeon general flounders amid GOP doubts

Your Coding Skills Are About to Become Worthless

What are you doing this week?

How high of a refresh rate does your TV really need? An expert's buying advice

Related Articles

News
A mission NASA might kill is still returning fascinating science from Jupiter
Ars Technica • 3h ago

News
Trump's MAHA pick for surgeon general flounders amid GOP doubts
Ars Technica • 4h ago

News
Your Coding Skills Are About to Become Worthless
Medium Programming • 4h ago

News
What are you doing this week?
Lobsters • 4h ago

News
How high of a refresh rate does your TV really need? An expert's buying advice
ZDNet • 5h ago