DoorDash Builds LLM Conversation Simulator to Test Customer Support Chatbots at Scale

via InfoQLeela Kumili13h ago

DoorDash engineers built a simulation and evaluation flywheel to test large language model customer support chatbots at scale. The system generates multi-turn synthetic conversations using historical transcripts and backend mocks, evaluates outcomes with an LLM-as-judge framework, and enables rapid iteration on prompts, context, and system design before production deployment. By Leela Kumili

Continue reading on InfoQ

Opens in a new tab

Read Full Article

8 views

How-To