FlareStart
HomeNewsHow ToSources
FlareStart

Where developers start their day. All the tech news & tutorials that matter, in one place.

Quick Links

  • Home
  • News
  • Tutorials
  • Sources
  • Privacy Policy

Connect

© 2026 FlareStart. All rights reserved.

Back to articles
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs
NewsProgramming Languages

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

via Dev.to Pythonsoy3h ago

The Problem: vLLM Hogs Your GPU 24/7 If you run a local LLM with vLLM, you know the pain. The moment you start the server, it claims ~90% of your VRAM and never lets go — even when nobody's asking it anything. On a dedicated inference server, that's fine. But on a single consumer GPU (RTX 5090 in my case), I also need VRAM for: Shogi engine (DL-based, needs ~4GB VRAM) Whisper transcription (large-v3, GPU-accelerated) Training runs, experiments, occasional gaming Running vLLM permanently means everything else fights for scraps. Killing and restarting vLLM manually every time is not a workflow — it's a chore. The Solution: A Gateway That Manages vLLM's Lifecycle I wrote a single-file FastAPI gateway ( vllm_gateway.py , ~390 lines) that: Listens on port 8000 with near-zero VRAM usage Auto-starts vLLM on an internal port (8100) when a request arrives Auto-stops vLLM after 10 minutes of idle, fully freeing VRAM Rewrites tool calls from Nemotron's <TOOLCALL> format to OpenAI-compatible tool_

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
2 views

Related Articles

Anker’s wireless charging pad offers Qi2 speeds for $15
News

Anker’s wireless charging pad offers Qi2 speeds for $15

The Verge • 3h ago

Everything you didn’t want to know about social media…
News

Everything you didn’t want to know about social media…

Medium Programming • 3h ago

The best wireless chargers are so much better than cords - and they're on sale
News

The best wireless chargers are so much better than cords - and they're on sale

ZDNet • 3h ago

News

Using FireWire on a Raspberry Pi

Lobsters • 4h ago

News

ssereload(1) introduction

Lobsters • 4h ago

Discover More Articles