TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Paper: https://arxiv.org/abs/2511.08923 Abstract: Diffusion language models hold the promise of fast parallel generation, while autoregressive (AR) models typically excel in quality due to their causal structure aligning naturally with language modeling. This raises a fundamental question: can we achieve a synergy with high throughput, higher GPU utilization, and AR level quality? Existing methods fail to effectively balance these two aspects, either prioritizing AR using a weaker model for sequential drafting (speculative decoding), leading to lower drafting efficiency, or using some form of left-to-right (AR-like) decoding logic for diffusion, which still suffers from quality degradation and forfeits its potential parallelizability. We introduce TiDAR, a sequence-level hybrid architecture that drafts tokens (Thinking) in Diffusion and samples final outputs (Talking) AutoRegressively - all within a single forward pass using specially designed structured attention masks. This design ex

TiDAR: Think in Diffusion, Talk in Autoregression (Paper Analysis)

Related Articles

Why Degrees Don’t Make Developers

When you write your tests TOO LATE... #softwareengineering

"Hello police? I'd like to report a journalism."

Traditional X-Mas Stream

Ramayan par Western Researchers kya kehte hain? Myth, History ya Civilizational Memory?

Related Articles

Article
Why Degrees Don’t Make Developers
Continuously Delivered • 2w ago

Article
When you write your tests TOO LATE... #softwareengineering
Continuously Delivered • 3w ago

Article
"Hello police? I'd like to report a journalism."
Benn Jordan • 1mo ago

Article
Traditional X-Mas Stream
Yannic Kilcher • 1mo ago

News
Ramayan par Western Researchers kya kehte hain? Myth, History ya Civilizational Memory?
Medium Programming • 5m ago