Building a Local RAG Pipeline on Mobile: Vector Search with SQLite, On-Device Embeddings, and a Shared KMP Architecture

via Dev.to WebdevSoftwareDevs mvpfactory.io4h ago

--- title : " Local RAG on Mobile: Vector Search Under 200ms with KMP" published : true description : " Build a fully offline RAG pipeline on mobile using sqlite-vss, ONNX Runtime, and a shared KMP repository layer — under 50MB and sub-200ms latency." tags : kotlin, android, mobile, architecture canonical_url : https://blog.mvpfactory.co/local-rag-on-mobile-vector-search-under-200ms --- ## What We Are Building Let me show you a pattern I use in every project that needs smart, offline search. We are building a complete retrieval-augmented generation pipeline — embedding generation, vector similarity search, and context assembly — running entirely on-device. No network calls. No cloud dependencies. By the end of this tutorial, you will have a shared KMP module that takes a user query, generates an embedding via ONNX Runtime, searches a sqlite-vss index, and returns ranked results. The full pipeline clocks in at ~140ms p95 on a Pixel 7a with a 38MB total footprint. ## Prerequisites - Kotl

Continue reading on Dev.to Webdev

Opens in a new tab

Read Full Article

2 views