
llama.swap Model Switcher Quickstart for OpenAI-Compatible Local LLMs
Soon you are juggling vLLM, llama.cpp, and more—each stack on its own port. Everything downstream still wants one /v1 base URL; otherwise you keep shuffling ports, profiles, and one-off scripts. llama-swap is the /v1 proxy before those stacks. llama-swap provides one OpenAI- and Anthropic-compatible front door, with a YAML file that maps each model name to the command that starts the right upstream. Request a model and the proxy starts or swaps to it; configure TTLs and groups when VRAM is tight or several models must coexist. This guide covers install paths, a practical config.yaml , the HTTP surface, and the failure modes that show up once streaming and reverse proxies enter the picture. For a broader comparison of LLM hosting options, see LLM Hosting in 2026: Local, Self-Hosted & Cloud Infrastructure Compared llama-swap model switcher overview for OpenAI-compatible local LLM APIs llama-swap is a lightweight proxy server built around a simple operational model: one binary, one YAML c
Continue reading on Dev.to
Opens in a new tab



