Running AI in the Browser with Gemma 4 (No API, No Server)

Most “AI apps” today are just API wrappers. That’s fine… until you care about latency, cost, or privacy. I’ve been exploring what it actually takes to run LLMs inside the browser, and Gemma 4 completely changes what’s possible. This is not theory this is what actually works. Why Gemma 4 is different Gemma 4 isn’t just another model release. It’s designed for: • on-device inference • agentic workflows • multimodal tasks (text, audio, vision) The important part? 👉 The E2B / E4B variants are small enough to run inside a browser tab. No backend required. ⚙️ How it actually runs in the browser Let’s cut the hype. There are only 2 real approaches: 1. MediaPipe LLM Inference (Recommended) • WebAssembly + WebGPU under the hood • Load model like: const llm = await LlmInference.createFromOptions({ modelAssetPath: "/models/gemma-4-E2B.litertlm", }); That’s it. You now have: • streaming responses • token control • temperature, top-k, etc. 2. WebGPU (Transformers.js style) More control, more pain.

Running AI in the Browser with Gemma 4 (No API, No Server)

Related Articles

SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets

NAS sync with lsyncd and rsync: what was not working and how I fixed it

Installing every* Firefox extension

Why XIRR Breaks When Your Angel Portfolio Hits 10+ Investments

Installing OpenBSD on the Pomera DM250{,XY?}

Related Articles

How-To
SDK v0.2.9: Output Verification, Attestations, Preflight and Budgets
Dev.to • 22h ago

How-To
NAS sync with lsyncd and rsync: what was not working and how I fixed it
Dev.to • 1d ago

How-To
Installing every* Firefox extension
Lobsters • 1d ago

How-To
Why XIRR Breaks When Your Angel Portfolio Hits 10+ Investments
Dev.to • 1d ago

How-To
Installing OpenBSD on the Pomera DM250{,XY?}
Lobsters • 1d ago