
Can Spiking Neural Networks Kill the GPU? 3 Papers Show the Reality
GPU Dominance in AI Inference Is Getting Challenged Running llama.cpp on an RTX 4060, the fans scream. 95W. 38 tok/s. The results are fine, but the moment you talk power efficiency, things get awkward. An M4 Mac mini pulls the same speed at 30W, and CUDA's brute-force approach becomes hard to defend. Meanwhile, the biological brain runs on 20W. And most of that goes to maintaining membrane potentials and keeping synapses on standby — the incremental cost of "conscious thought" is less than 5% above baseline (Raichle, Science , 2006). That puts actual thinking at under 1W. The human brain has roughly 86 billion neurons, and only 1-2% fire at any given moment (Lennie, Current Biology , 2003). Only the neurons that need to spike do so, only when needed. This is fundamentally different from Transformer inference, where every parameter is active on every token. Spiking Neural Networks (SNNs) and neuromorphic computing are trying to bring this biological design principle into hardware. Three
Continue reading on Dev.to
Opens in a new tab


