I Turned Karpathy's Autoresearch Into a Skill That Optimizes Anything — Here Is the Architecture

Karpathy released autoresearch last week. 31,000 stars. 100 ML experiments overnight on one GPU. Everyone wrote about the ML training loop. I saw something different: a pattern. One file. One metric. One loop. Modify → Evaluate → Keep or Discard → Repeat. That pattern has nothing to do with machine learning. So I built a skill that applies it to: → API response time (benchmark_speed evaluator) → Bundle size (benchmark_size evaluator) → Headline click-through (LLM judge evaluator) → System prompt quality (LLM judge evaluator) → Test pass rate, build speed, memory usage Works across 11 tools: Claude Code, Codex, Gemini CLI, Cursor, Windsurf, OpenClaw, and more. The Full Medium Article The hardest problem: evaluating things that are not numbers. Headlines do not come with a val_bpb metric. Solution: LLM judges using the agent's own subscription. Critical constraint: the agent cannot modify its own evaluator. (The alignment problem in miniature.) What I have not done yet: run 100 experimen

I Turned Karpathy's Autoresearch Into a Skill That Optimizes Anything — Here Is the Architecture

Related Articles

5 Things I Learned After 3 Years as a Software Engineer

I Thought Learning to Code Would Change My Life. I Was Right — But Not in the Way I Expected

Why Programming Paradigms Matter in Modern Software Development?

How to clear your Roku TV cache (and why it's critical to do so)

Introducing KodeSherpa: Build DeFi Smart Contracts with Ease

Related Articles

How-To
5 Things I Learned After 3 Years as a Software Engineer
Medium Programming • 1h ago

How-To
I Thought Learning to Code Would Change My Life. I Was Right — But Not in the Way I Expected
Medium Programming • 3h ago

How-To
Why Programming Paradigms Matter in Modern Software Development?
Medium Programming • 3h ago

How-To
How to clear your Roku TV cache (and why it's critical to do so)
ZDNet • 3h ago

How-To
Introducing KodeSherpa: Build DeFi Smart Contracts with Ease
Dev.to • 4h ago