
Karpathy's Autoresearch: Improving Agentic Coding Skills
Introduction Recently, Andrej Karpathy made his autoresearch workflow public: https://github.com/karpathy/autoresearch . The idea is to autonomously improve a model's training process based on experiment results. Using Claude Code, you run this loop for hours or days and end up with a better model. The whole flow is described in the program.md file as a skill: https://github.com/karpathy/autoresearch/blob/master/program.md I'm not training any LLMs for work or even as a hobby, but I do a lot of coding, now mostly with Claude Code. To generate high-quality code that consistently follows conventions and standards, I use multiple skills, memory files, sub-agents, hooks, etc., let's call it an agentic harness. However, I evaluate this harness rather naively, not based on experiments or metrics - let’s say, not scientifically. The usual approach has been: test best practices that feel useful -> if they work -> incorporate them into the workflow. Or, if issues are caught during human review
Continue reading on Dev.to Tutorial
Opens in a new tab




