Back to articles
I Ran Three LLMs Entirely in the Browser to Power an AI Coaching Feature. Here's What I Measured.

I Ran Three LLMs Entirely in the Browser to Power an AI Coaching Feature. Here's What I Measured.

via Dev.toMichael Stelly

I'm building Holocron , a browser-based combat log analyzer for the Star Wars: The Old Republic (SWTOR) video game. The core product thesis is that parsers that stop at showing you numbers aren't useful enough. A good tool tells you what to do about them . The coaching layer I'm building takes ~1500 tokens of structured combat stats (spec, abilities, DPS numbers, rule-based findings) and returns ~500 tokens of plain-language guidance. It runs after parsing, entirely client-side. No server. No account. No data leaving the browser. I already had Ollama working as a local LLM provider. But Ollama requires the user to install a background service, pull a model, and make sure it's running. For a tool where frictionless entry is a design constraint, that's a real drop-off risk. So I ran a spike to find out whether @mlc-ai/web-llm with WebGPU could replace that setup entirely: just open the page, wait under 30 seconds on first visit (measured 23.7s on the test hardware), and get AI coaching w

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles