MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

Start with the benchmarks In a previous article , I compared three Qwen3.5 models on the same hardware. Here are the MoE-relevant numbers. Test environment: RTX 4060 8GB / Ryzen 7 / 32GB DDR5 / llama.cpp / Q4_K_M Model Speed(t/s) VRAM GPU% CPU% RAM ngl Qwen3.5-9B 33.0 7.1GB 91% 32% 22.6GB 99 (all layers GPU) Qwen3.5-27B 3.57 7.7GB 60% 74% 28.3GB 24 (24/58 layers GPU) Qwen3.5-35B-A3B 8.61 7.6GB 95% 65% 30.8GB 99 (all layers GPU) All three models consume nearly the same VRAM (7.1-7.7GB). Yet speed varies by 10x: 33.0, 3.57, 8.61 t/s. The critical comparison is Dense 27B vs MoE 35B-A3B. The 35B model is faster than the 27B model by 2.4x, despite having more parameters. Why 35B beats 27B The answer is in the GPU utilization numbers. Dense 27B (GPU 60%) : Q4_K_M size is about 16GB. Can't fit in 8GB, so only 24 out of 58 layers run on the GPU (ngl=24). The remaining 34 layers run on CPU. The GPU finishes its portion and sits idle waiting for the CPU. 60% GPU utilization means the GPU is wast

MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

Related Articles

Deep Dive into Functions: dir(), pip, Default Args, *args, **kwargs, Type Hints, Positional/Keyword…

Stop Writing Clever Code

Anthropic’s Claude Code Source Code Leaked: The npm .map Blunder That Exposed Everything

Amazon Spring Sale live blog 2026: Last day to score top deals

Mastering Clean Code Part 6

Related Articles

News
Deep Dive into Functions: dir(), pip, Default Args, *args, **kwargs, Type Hints, Positional/Keyword…
Medium Programming • 1h ago

News
Stop Writing Clever Code
Medium Programming • 1h ago

News
Anthropic’s Claude Code Source Code Leaked: The npm .map Blunder That Exposed Everything
Medium Programming • 1h ago

News
Amazon Spring Sale live blog 2026: Last day to score top deals
ZDNet • 1h ago

News
Mastering Clean Code Part 6
Medium Programming • 1h ago