
NeurIPS 2025 Proved It: Every LLM Says the Same Thing — Here's the Fix
"Write a metaphor about time." Ask 25 different language models this question. Sample 50 responses from each. What do you get? 1,250 responses that collapse into exactly two metaphors : "time is a river" and "time is a weaver." That's it. GPT-4o, Claude, Llama, Qwen, Mixtral, DeepSeek — models built by different companies, trained on different data, with different architectures — all converging on the same two ideas. This isn't a toy example. It's a finding from Artificial Hivemind , a paper accepted as an oral presentation at NeurIPS 2025 by researchers from the University of Washington, CMU, Stanford, and AI2. The Scale of the Problem The researchers built Infinity-Chat , a dataset of 26,000 real-world open-ended queries — the kind with no single correct answer. They tested 70+ models (25 in the main paper) and found two devastating patterns: 1. Intra-Model Repetition Sample the same model 50 times with identical parameters (top-p=0.9, temperature=1.0). In 79% of cases , the average
Continue reading on Dev.to
Opens in a new tab



