Back to articles
Self-Hosting AI Models on a Budget VPS: A Practical Workshop

Self-Hosting AI Models on a Budget VPS: A Practical Workshop

via Dev.to WebdevSoftwareDevs mvpfactory.io

What We Are Building By the end of this workshop, you will have a self-hosted LLM running on a budget VPS behind a request queue — and more importantly, you will know exactly when this setup makes financial sense versus just calling an API. Let me show you the numbers, the minimal setup, and the decision framework I use with every team that asks me about this. Prerequisites A VPS with at least 16 GB RAM and 4 vCPUs (Hetzner CX42 or equivalent, ~$24/month) Docker installed on the instance Basic comfort with the terminal An honest willingness to look at spreadsheets Step 1: Understand the Hardware Floor Here is the gotcha that will save you hours: LLM inference is memory-bound , not compute-bound. The model's parameter count dictates your RAM floor before anything else. Model Parameters Min RAM (Q4 Quantized) Recommended VPS Phi-3 Mini 3.8B 3 GB 8 GB / 4 vCPU Llama 3.1 8B 8B 5 GB 16 GB / 4 vCPU Mistral 7B 7.3B 5 GB 16 GB / 4 vCPU Qwen2.5 32B 32B 20 GB 64 GB / dedicated GPU The sweet spot

Continue reading on Dev.to Webdev

Opens in a new tab

Read Full Article
0 views

Related Articles