Back to articles
Ollama in Docker Compose with GPU and Persistent Model Storage
How-ToDevOps

Ollama in Docker Compose with GPU and Persistent Model Storage

via Dev.toRost

Ollama works great on bare metal. It gets even more interesting when you treat it like a service: a stable endpoint, pinned versions, persistent storage, and a GPU that is either available or it is not. This post focuses on one goal: a reproducible local or single-node Ollama "server" using Docker Compose, with GPU acceleration and persistent model storage. It intentionally skips generic Docker and Compose basics. When you need a compact list of the commands you reach for most often (images, containers, volumes, docker compose ), the Docker Cheatsheet is a good companion. When you want HTTPS in front of Ollama, correct streaming and WebSocket proxying, and edge controls (auth, timeouts, rate limits), see Ollama behind a reverse proxy with Caddy or Nginx for HTTPS streaming . For how Ollama fits alongside vLLM, Docker Model Runner, LocalAI, and cloud hosting trade-offs, see LLM Hosting in 2026: Local, Self-Hosted & Cloud Infrastructure Compared . When Compose beats a bare metal install

Continue reading on Dev.to

Opens in a new tab

Read Full Article
8 views

Related Articles