Back to articles
Top 5 AI Agent Eval Tools After Promptfoo's Exit
How-ToTools

Top 5 AI Agent Eval Tools After Promptfoo's Exit

via Dev.toNebula

TL;DR: DeepEval for pytest-native open-source evaluation. Braintrust for full-lifecycle eval with CI/CD quality gates. Arize Phoenix for vendor-neutral self-hosted tracing and eval. LangSmith if you are all-in on LangChain. Comet Opik for budget-conscious teams running high-volume traces. Promptfoo Is Gone. Now What? On March 9, OpenAI acquired Promptfoo for $86 million. Promptfoo was the most widely used open-source LLM eval and red-teaming CLI -- 10,800 GitHub stars, used by thousands of teams testing prompts, model outputs, and agent behavior across every major provider. The acquisition raises an immediate question for anyone using non-OpenAI models: will Promptfoo stay vendor-neutral? The team says yes. The incentive structure says maybe not. Whether you are running agents on Nebula , LangGraph, CrewAI, or your own framework, eval tooling is non-negotiable. Agents that call tools, make decisions, and interact with production systems need automated testing that catches failures befo

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles