
Self-Hosting AI Models on a Budget VPS: A Practical Workshop
What We Are Building By the end of this workshop, you will have a self-hosted LLM running on a budget VPS behind a request queue — and more importantly, you will know exactly when this setup makes financial sense versus just calling an API. Let me show you the numbers, the minimal setup, and the decision framework I use with every team that asks me about this. Prerequisites A VPS with at least 16 GB RAM and 4 vCPUs (Hetzner CX42 or equivalent, ~$24/month) Docker installed on the instance Basic comfort with the terminal An honest willingness to look at spreadsheets Step 1: Understand the Hardware Floor Here is the gotcha that will save you hours: LLM inference is memory-bound , not compute-bound. The model's parameter count dictates your RAM floor before anything else. Model Parameters Min RAM (Q4 Quantized) Recommended VPS Phi-3 Mini 3.8B 3 GB 8 GB / 4 vCPU Llama 3.1 8B 8B 5 GB 16 GB / 4 vCPU Mistral 7B 7.3B 5 GB 16 GB / 4 vCPU Qwen2.5 32B 32B 20 GB 64 GB / dedicated GPU The sweet spot
Continue reading on Dev.to Webdev
Opens in a new tab

