I Tested Every Gemma 4 Model Locally on My MacBook - What Actually Works

Audio ASR in 3 languages, image understanding, full-stack app generation, coding, and agentic behavior -- all running on a MacBook M4 Pro with 24GB RAM. Interactive version with playable audio, live charts, and the working React app: gemma4-benchmark.pages.dev Google just released Gemma 4 -- their new family of open-source multimodal models. Four sizes, Apache-2.0 licensed, supports text + image + audio. I spent a day testing every variant. Real audio files. Real images. Code that has to compile and run. Here is my honest report. The Gemma 4 Family E2B -- Dense 2.3B, Text/Image/Audio, 4 GB at 4-bit. Phones and edge. E4B -- Dense 4.5B, Text/Image/Audio, 5.5 GB at 4-bit. Laptops. 26B-A4B -- MoE 4B active/26B total, Text/Image, 16-18 GB at 4-bit. 31B -- Dense 31B, Text/Image, 17-20 GB at 4-bit. Maximum quality. Speed Benchmarks Ollama: E2B 95 tok/s | E4B 57 tok/s | 26B ~2 tok/s (swap) | 31B won't fit Unsloth MLX: E2B 81 tok/s (3.6 GB) | E4B 49 tok/s (5.6 GB) Ollama is 15-20% faster. Unslo

I Tested Every Gemma 4 Model Locally on My MacBook - What Actually Works

Related Articles

Why Lean?

Examples are the best documentation

What road map to choose??

The Feature That Has Never Worked · A broken auto-live poller, and what perceived urgency does to Claude Code

Your code is worthless

Related Articles

News
Why Lean?
Lobsters • 3h ago

News
Examples are the best documentation
Reddit Programming • 5h ago

News
What road map to choose??
Reddit Programming • 6h ago

News
The Feature That Has Never Worked · A broken auto-live poller, and what perceived urgency does to Claude Code
Lobsters • 8h ago

News
Your code is worthless
Lobsters • 8h ago