Why on-device agentic AI can't keep up

There's a growing narrative that on-device AI is about to free us from the cloud - the pitch is compelling. Local inference means privacy, zero latency, no API costs. Run your own agents on your computer or phone, no cloud required. Indeed, the pace of improvements in open weights models has been spectacular - if you've got (tens of) thousands to drop on a Mac Studio cluster or a high end GPU setup, local models are genuinely useful. But for the other 99% of devices people actually carry around, every time I open llama.cpp to do some local on device work, it feels - if anything - like progress is going backwards relative to what I can do with frontier models. There are some hard physical limits to what consumer hardware can do - and they're not going away any time soon. For the purposes of this article, I'm referring to agentic capabilities in a personal admin capacity. Think searching emails and composing a reply and sending a calendar invite. More advanced capabilities like we see in

Why on-device agentic AI can't keep up

Related Articles

Building an MCP Server for Your Own Tools

[MM’s] Boot Notes — The Day Zero Blueprint — Test Smarter on Day One

RHAPSODY OF REALITIES - 26TH MARCH 2026 "In Nehemiah’s day, as the people built the wall of…

How to Actually Make Money with a "Free" App

Building a Runtime with QuickJS

Related Articles

How-To
Building an MCP Server for Your Own Tools
Medium Programming • 5d ago

How-To
[MM’s] Boot Notes — The Day Zero Blueprint — Test Smarter on Day One
Medium Programming • 5d ago

How-To
RHAPSODY OF REALITIES - 26TH MARCH 2026 "In Nehemiah’s day, as the people built the wall of…
Medium Programming • 5d ago

How-To
How to Actually Make Money with a "Free" App
Medium Programming • 5d ago

How-To
Building a Runtime with QuickJS
Lobsters • 5d ago