Back to articles
I Hid the AI’s “Thinking” in Plain Sight: Dual-Channel Streaming for an AI Search Chatbot That Works Mid-Call (Series Part 9)

I Hid the AI’s “Thinking” in Plain Sight: Dual-Channel Streaming for an AI Search Chatbot That Works Mid-Call (Series Part 9)

via Dev.to PythonDaniel Romitelli

I watched a recruiter share their screen on a client call and realized the worst possible thing was happening: the assistant’s raw “thinking” was spilling onto the screen like debug logs. The content wasn’t wrong —it was just the kind of internal narration you never want a client to read while you’re trying to sound decisive. This is Part 9 of my series “How to Architect an Enterprise AI System (And Why the Engineer Still Matters)” . In Part 8, I talked about routing search across Azure AI Search, pgvector, and the CRM as a live fallback. This post is about what happened next: once the answers got good, the delivery became the product. The core decision: progressive disclosure via dual-channel streaming (thinking + results) with an interruptible UX . I stream the model’s THINKING tokens on one channel, stream QUERY_RESULT events on another, and build candidate cards from structured events—not from text. The key insight (and why the naive approach fails) The naive approach to streaming

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
2 views

Related Articles