Ollama Just Got Stupid Fast on Mac and Nobody Is Talking About What This Actually Means

So Ollama dropped version 0.19 yesterday and I genuinely think most people are sleeping on how big this is. They rebuilt the entire Mac backend on top of Apple's MLX framework and the speed numbers are kind of absurd. Were talking 1,851 tokens per second on prefill and 134 tokens per second on decode. If those numbers dont mean anything to you, let me put it this way — thats roughly twice as fast as the previous version. On the same hardware. Same model. Just better software underneath. I've been running local models on my MacBook for months now and the experience has always been this weird mix of "wow this actually works" and "ok why is it taking 15 seconds to start responding." That second part just got obliterated. The time to first token improvement alone changes how it feels to use coding agents locally. When you're running something like Claude Code or OpenCode through Ollama and it responds in under a second instead of making you wait, that's not just a benchmark win, thats a wo

Ollama Just Got Stupid Fast on Mac and Nobody Is Talking About What This Actually Means

Related Articles

The HP OmniBook 5 Is a MacBook Neo Killer, and It's Only $500

Trump defunding of NPR and PBS blocked by judge, but damage is already done

Everything is iPhone now

Terms & Conditions: Soundboks Giveaway

Our Favorite Budget Smartwatch is $69

Related Articles

News
The HP OmniBook 5 Is a MacBook Neo Killer, and It's Only $500
Wired • 1h ago

News
Trump defunding of NPR and PBS blocked by judge, but damage is already done
Ars Technica • 1h ago

News
Everything is iPhone now
The Verge • 1h ago

News
Terms & Conditions: Soundboks Giveaway
Wired • 2h ago

News
Our Favorite Budget Smartwatch is $69
Wired • 2h ago