
I Tested Every Gemma 4 Model Locally on My MacBook - What Actually Works
Audio ASR in 3 languages, image understanding, full-stack app generation, coding, and agentic behavior -- all running on a MacBook M4 Pro with 24GB RAM. Interactive version with playable audio, live charts, and the working React app: gemma4-benchmark.pages.dev Google just released Gemma 4 -- their new family of open-source multimodal models. Four sizes, Apache-2.0 licensed, supports text + image + audio. I spent a day testing every variant. Real audio files. Real images. Code that has to compile and run. Here is my honest report. The Gemma 4 Family E2B -- Dense 2.3B, Text/Image/Audio, 4 GB at 4-bit. Phones and edge. E4B -- Dense 4.5B, Text/Image/Audio, 5.5 GB at 4-bit. Laptops. 26B-A4B -- MoE 4B active/26B total, Text/Image, 16-18 GB at 4-bit. 31B -- Dense 31B, Text/Image, 17-20 GB at 4-bit. Maximum quality. Speed Benchmarks Ollama: E2B 95 tok/s | E4B 57 tok/s | 26B ~2 tok/s (swap) | 31B won't fit Unsloth MLX: E2B 81 tok/s (3.6 GB) | E4B 49 tok/s (5.6 GB) Ollama is 15-20% faster. Unslo
Continue reading on Dev.to
Opens in a new tab
