Top 5 AI Agent Eval Tools After Promptfoo's Exit

TL;DR: DeepEval for pytest-native open-source evaluation. Braintrust for full-lifecycle eval with CI/CD quality gates. Arize Phoenix for vendor-neutral self-hosted tracing and eval. LangSmith if you are all-in on LangChain. Comet Opik for budget-conscious teams running high-volume traces. Promptfoo Is Gone. Now What? On March 9, OpenAI acquired Promptfoo for $86 million. Promptfoo was the most widely used open-source LLM eval and red-teaming CLI -- 10,800 GitHub stars, used by thousands of teams testing prompts, model outputs, and agent behavior across every major provider. The acquisition raises an immediate question for anyone using non-OpenAI models: will Promptfoo stay vendor-neutral? The team says yes. The incentive structure says maybe not. Whether you are running agents on Nebula , LangGraph, CrewAI, or your own framework, eval tooling is non-negotiable. Agents that call tools, make decisions, and interact with production systems need automated testing that catches failures befo

Top 5 AI Agent Eval Tools After Promptfoo's Exit

Related Articles

The Go Paradox: Why Go’s Simplicity Creates Complexity

The Cube That Taught Me to Code

Data quality testing: how Bruin and dbt take different paths to the same goal

A Funeral for the Coder

Monorepo vs. Polyrepo: How to Choose the Right Strategy for Managing Multiple Services

Related Articles

How-To
The Go Paradox: Why Go’s Simplicity Creates Complexity
Medium Programming • 5h ago

How-To
The Cube That Taught Me to Code
Medium Programming • 6h ago

How-To
Data quality testing: how Bruin and dbt take different paths to the same goal
Dev.to • 6h ago

How-To
A Funeral for the Coder
Dev.to • 7h ago

How-To
Monorepo vs. Polyrepo: How to Choose the Right Strategy for Managing Multiple Services
Medium Programming • 7h ago