
What Karpathy's Autoresearch Unlocked for Me
I'm not a data scientist. I've trained a few models before — simple classification problems, with AI writing the Python and me running the iterations. It worked. I got confident. Then a friend asked for help with something harder. Three Weeks at 0.58 The problem involved predicting an outcome from a mix of CRM data and call recordings. Not trivial, but not exotic either. Quick primer on AUC — the metric I'll use throughout. Imagine your model looks at two random people: one where the answer is yes, one where it's no. AUC measures how often the model correctly ranks the yes above the no. Score of 0.5 means random guessing. Score of 1.0 means always right. I tried everything I knew: XGBoost, feature engineering, extracting features from transcripts using AI models, trying different combinations. I assumed more data meant better results — that's how it's supposed to work. Instead, every time I added more features, the AUC dropped. Below 0.5 sometimes. Meaning the model was now actively mi
Continue reading on Dev.to
Opens in a new tab

