I Hid the AI’s “Thinking” in Plain Sight: Dual-Channel Streaming for an AI Search Chatbot That Works Mid-Call (Series Part 9)

via Dev.to PythonDaniel Romitelli4h ago

I watched a recruiter share their screen on a client call and realized the worst possible thing was happening: the assistant’s raw “thinking” was spilling onto the screen like debug logs. The content wasn’t wrong —it was just the kind of internal narration you never want a client to read while you’re trying to sound decisive. This is Part 9 of my series “How to Architect an Enterprise AI System (And Why the Engineer Still Matters)” . In Part 8, I talked about routing search across Azure AI Search, pgvector, and the CRM as a live fallback. This post is about what happened next: once the answers got good, the delivery became the product. The core decision: progressive disclosure via dual-channel streaming (thinking + results) with an interruptible UX . I stream the model’s THINKING tokens on one channel, stream QUERY_RESULT events on another, and build candidate cards from structured events—not from text. The key insight (and why the naive approach fails) The naive approach to streaming

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article

2 views

I Hid the AI’s “Thinking” in Plain Sight: Dual-Channel Streaming for an AI Search Chatbot That Works Mid-Call (Series Part 9)

Related Articles

10 Lessons I Learned from a Principal Engineer That Made Me a Better Developer

The Best Developers I Know Have Stopped Learning.

How to Structure Large Flutter Projects Like Senior Developers

Why the Monolith is a Dead End for the Weekend Indie Developer

Understand OpenClaw by Building One —Part 3