
99.8% of LLM Inference Power Isn't Spent on Computation
99.8% of LLM Inference Power Isn't Spent on Computation When people debate LLM inference bottlenecks, bandwidth and VRAM dominate the conversation. But of the five walls identified by LIMINAL (Davies et al., arXiv:2507.14397), the hardest one to break through is power . Bandwidth scales by widening the bus (HBM4 did exactly that). Capacity scales by stacking more dies. But power is directly chained to physics. The era when process shrinks automatically reduced power consumption ended around 2006, when Dennard scaling collapsed. # The collapse of Dennard Scaling dennard_scaling = { " 1970-2006 " : { " rule " : " Smaller transistors -> lower voltage -> constant power per area " , " result " : " Performance/W improved for free with every node shrink " , " benefit " : " Moore ' s Law + Dennard ' s Law in sync -> exponential perf gains " , }, " 2006-present " : { " reality " : " Voltage can ' t drop further (subthreshold leakage) " , " result " : " Shrinking transistors no longer reduces pe
Continue reading on Dev.to
Opens in a new tab
