
Run LLMs locally in Flutter apps
In this tutorial, you'll learn how to run a large language model (LLM) directly on a user's device — no cloud, no server, no cost. We'll start from scratch, build a working chat interface, and progressively introduce more advanced features: tool calling, sampling, and RAG. Each concept is explained before the code, so you can follow along whether you're new to on-device AI or just new to NobodyWho. The example app for this article is available on GitHub if you want to jump straight to working code. It is kept up to date with the latest features — if you want the code that matches this tutorial exactly, check out this commit . Why Run AI On-Device? Most AI features rely on a cloud API: you send a request to a remote server, it runs the model, and sends a response back. That works well, but it comes with tradeoffs. Running the model directly on the device avoids all of them: Works offline — no internet connection required Privacy by design — user data never leaves the device Low latency
Continue reading on Dev.to
Opens in a new tab



