Back to articles
99.8% of LLM Inference Power Isn't Spent on Computation

99.8% of LLM Inference Power Isn't Spent on Computation

via Dev.toplasmon

99.8% of LLM Inference Power Isn't Spent on Computation When people debate LLM inference bottlenecks, bandwidth and VRAM dominate the conversation. But of the five walls identified by LIMINAL (Davies et al., arXiv:2507.14397), the hardest one to break through is power . Bandwidth scales by widening the bus (HBM4 did exactly that). Capacity scales by stacking more dies. But power is directly chained to physics. The era when process shrinks automatically reduced power consumption ended around 2006, when Dennard scaling collapsed. # The collapse of Dennard Scaling dennard_scaling = { " 1970-2006 " : { " rule " : " Smaller transistors -> lower voltage -> constant power per area " , " result " : " Performance/W improved for free with every node shrink " , " benefit " : " Moore ' s Law + Dennard ' s Law in sync -> exponential perf gains " , }, " 2006-present " : { " reality " : " Voltage can ' t drop further (subthreshold leakage) " , " result " : " Shrinking transistors no longer reduces pe

Continue reading on Dev.to

Opens in a new tab

Read Full Article
0 views

Related Articles