99.8% of LLM Inference Power Isn't Spent on Computation

99.8% of LLM Inference Power Isn't Spent on Computation When people debate LLM inference bottlenecks, bandwidth and VRAM dominate the conversation. But of the five walls identified by LIMINAL (Davies et al., arXiv:2507.14397), the hardest one to break through is power . Bandwidth scales by widening the bus (HBM4 did exactly that). Capacity scales by stacking more dies. But power is directly chained to physics. The era when process shrinks automatically reduced power consumption ended around 2006, when Dennard scaling collapsed. # The collapse of Dennard Scaling dennard_scaling = { " 1970-2006 " : { " rule " : " Smaller transistors -> lower voltage -> constant power per area " , " result " : " Performance/W improved for free with every node shrink " , " benefit " : " Moore ' s Law + Dennard ' s Law in sync -> exponential perf gains " , }, " 2006-present " : { " reality " : " Voltage can ' t drop further (subthreshold leakage) " , " result " : " Shrinking transistors no longer reduces pe

99.8% of LLM Inference Power Isn't Spent on Computation

Related Articles

I Am Very Fond of the Pipeline Operator

Understand ARP in byte level

1SubML: Plan vs Reality

Group Lasso with Overlaps: the Latent Group Lasso approach

Dave Garage - Why your new computer is slower than your old computer

Related Articles

News
I Am Very Fond of the Pipeline Operator
Reddit Programming • 3h ago

News
Understand ARP in byte level
Reddit Programming • 4h ago

News
1SubML: Plan vs Reality
Lobsters • 7h ago

News
Group Lasso with Overlaps: the Latent Group Lasso approach
Dev.to • 10h ago

News
Dave Garage - Why your new computer is slower than your old computer
Reddit Programming • 14h ago