Why I built a neutral LLM eval framework after Promptfoo joined OpenAI

A few weeks ago, Promptfoo — one of the most popular open-source LLM evaluation frameworks — joined OpenAI. I don't think that's inherently bad. But it created a real problem for the ecosystem: the tools we use to evaluate AI systems are increasingly owned by the same companies that build those AI systems. That's a conflict of interest that matters. So I built Rubric — an independent, MIT-licensed LLM and AI agent evaluation framework. No corporate parent. Open source forever. Here's what I learned building it, and why I think agent trace evaluation is the missing piece in most teams' LLM testing story. The gap: everyone evaluates output, nobody evaluates the journey Most LLM eval frameworks work like this: input → model → output → did the output match expected? That's fine for simple Q&A. But if you're building an AI agent — something that calls tools, makes decisions, and takes multi-step actions — the final output is only part of the story. What if the agent got the right answer but

Why I built a neutral LLM eval framework after Promptfoo joined OpenAI

Related Articles

Build Pipeline Executors Using Generator Functions

Designing Game Economies: Why Spreadsheets Eventually Break

How to use Jinja2 Templates

Excel for beginners

The Constant Coastline

Related Articles

How-To
Build Pipeline Executors Using Generator Functions
Medium Programming • 2h ago

How-To
Designing Game Economies: Why Spreadsheets Eventually Break
Dev.to • 2h ago

How-To
How to use Jinja2 Templates
Dev.to Tutorial • 2h ago

How-To
Excel for beginners
Dev.to Beginners • 3h ago

How-To
The Constant Coastline
Dev.to • 3h ago