FlareStart
HomeNewsHow ToSources
FlareStart

Where developers start their day. All the tech news & tutorials that matter, in one place.

Quick Links

  • Home
  • News
  • Tutorials
  • Sources
  • Privacy Policy

Connect

© 2026 FlareStart. All rights reserved.

Back to articles
Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts
How-ToTools

Show HN: ZSE – Open-source LLM inference engine with 3.9s cold starts

via Hacker Newszyoralabs1mo ago

I've been building ZSE (Z Server Engine) for the past few weeks — an open-source LLM inference engine focused on two things nobody has fully solved together: memory efficiency and fast cold starts. The problem I was trying to solve: Running a 32B model normally requires ~64 GB VRAM. Most developers don't have that. And even when quantization helps with memory, cold starts with bitsandbytes NF4 take 2+ minutes on first load and 45–120 seconds on warm restarts — which kills serverless and autoscaling use cases. What ZSE does differently: Fits 32B in 19.3 GB VRAM (70% reduction vs FP16) — runs on a single A100-40GB Fits 7B in 5.2 GB VRAM (63% reduction) — runs on consumer GPUs Native .zse pre-quantized format with memory-mapped weights: 3.9s cold start for 7B, 21.4s for 32B — vs 45s and 120s with bitsandbytes, ~30s for vLLM All benchmarks verified on Modal A100-80GB (Feb 2026) It ships with: OpenAI-compatible API server (drop-in replacement) Interactive CLI (zse serve, zse chat, zse conve

Continue reading on Hacker News

Opens in a new tab

Read Full Article
23 views

Related Articles

How-To

The Difference between `let`, `var` and `const`

Medium Programming • 2d ago

How-To

Circulation Metrics Framework for Living Systems

Medium Programming • 2d ago

Red Rooms makes online poker as thrilling as its serial killer
How-To

Red Rooms makes online poker as thrilling as its serial killer

The Verge • 2d ago

Don’t Know What Project to Build? Here Are Developer Projects That Actually Make You Better
How-To

Don’t Know What Project to Build? Here Are Developer Projects That Actually Make You Better

Medium Programming • 2d ago

Why Most Developers
Stay Broke
How-To

Why Most Developers Stay Broke

Medium Programming • 2d ago

Discover More Articles