11 Ways LLMs Fail in Production (With Academic Sources)

If you use LLMs in production, you've seen these. Not random errors, but systematic failures baked into architecture and training. I documented 11 behavioral failure modes with 60+ academic sources. Here's the short version. 1. Hallucination / Confabulation The model references a library that doesn't exist. Confidently. The worse variant: you ask "why?" and it fabricates a plausible justification for the wrong answer. Researchers prefer "confabulation" over "hallucination" because LLMs have no perceptual experience. Farquhar et al. (2024, Nature) introduced semantic entropy to detect it: cluster semantically equivalent answers, compute entropy. High entropy = probable fabrication. Defense: RAG, Chain-of-Verification, cross-model verification. 2. Sycophancy Ask "isn't this code wrong?" and the model says "yes, you're right" even when the code is correct. RLHF training causes this: evaluators rate agreeable answers higher, and the model learns that signal. A 2025 study found sycophantic

11 Ways LLMs Fail in Production (With Academic Sources)

Related Articles

Epic and Disney now let Fortnite creators make Star Wars games

The Event-Driven Design Choice That Creates Invisible Coupling in .NET

I use Android and a Mac. Here’s the app I had to build myself.

Tools for founders to navigate and move past conflict

The Hidden Cost of Starting From Scratch Every Time

Related Articles

How-To
Epic and Disney now let Fortnite creators make Star Wars games
The Verge • 47m ago

How-To
The Event-Driven Design Choice That Creates Invisible Coupling in .NET
Medium Programming • 49m ago

How-To
I use Android and a Mac. Here’s the app I had to build myself.
Medium Programming • 2h ago

How-To
Tools for founders to navigate and move past conflict
TechCrunch • 2h ago

How-To
The Hidden Cost of Starting From Scratch Every Time
Medium Programming • 3h ago