
AI Guardrail Poisoning: Someone Rewrote McKinsey’s Lilli With One SQL Query
Someone rewrote McKinsey's AI chatbot's guardrails with a single SQL UPDATE statement. No deployment needed. No code change. No one noticed until a security researcher wrote it up. That's the story of Lilli, McKinsey's internal AI assistant used by thousands of consultants. A researcher found a SQL injection flaw in the application layer. Because the flaw was read-write, an attacker could silently rewrite the prompts that controlled how Lilli behaved: what guardrails it followed, how it cited sources, what it refused to do. The Register covered it last week. "No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call." The holes are now patched. But the larger threat, as the researcher told The Register, remains. This is what I'd call guardrail poisoning. And it's more common than the industry wants to admit. TL;DR McKinsey's Lilli AI had its behavioral guardrails silently rewritten via SQL injection The attack vector: guardrails stored as mutabl
Continue reading on Dev.to
Opens in a new tab




