
I built an open-source real-time LLM hallucination guardrail — here are the benchmarks
What is Director-Class AI? An open-source Python library that guards LLM output in real time. It watches tokens as they stream and halts generation the moment it detects a hallucination. It uses NLI (Natural Language Inference via DeBERTa/FactCG) and optional RAG knowledge grounding to score each claim against source documents. pip install director-ai Two-line integration: from director_ai import guard client = guard ( openai . OpenAI ()) # wraps any OpenAI/Anthropic client Benchmarks (measured, not aspirational) Metric Value Conditions Balanced accuracy 75.8% FactCG on LLM-AggreFact (29,320 samples) GPU latency 14.6ms/pair GTX 1060, ONNX, batch=16 L40S latency 0.5ms/pair FP16, batch=32 E2E catch rate 90.7% Hybrid mode, 600 HaluEval traces Rust BM25 speedup 10.2x Over pure Python implementation Framework Integrations LangChain, LlamaIndex, LangGraph, CrewAI, Haystack, DSPy, Semantic Kernel, and SDK Guard (wraps OpenAI/Anthropic/Bedrock/Gemini/Cohere clients). Honest Limitations NLI-onl
Continue reading on Dev.to
Opens in a new tab

