Running NVIDIA Nemotron on a Mac with Docker Model Runner: What You Need to Know

via Dev.toAjeet Singh Raina2h ago

The NVIDIA Nemotron family is one of the most exciting model releases in recent months - efficient, capable, and increasingly accessible. With Docker Model Runner, pulling and running LLMs locally is as simple as pulling a container image. So naturally, I wanted to see how far I could push Nemotron on my Mac. Here's what I learned. How Docker Model Runner Works on Mac On macOS, Docker Model Runner uses the vllm-metal backend - a Metal-accelerated inference stack built on top of mlx-lm for Apple Silicon. This means models run natively on your Mac's unified memory using the GPU, with no cloud dependency and no Docker Linux VM involved in inference. The command to run any HuggingFace model is clean and familiar: docker model run hf.co/<model-name> The Nemotron Family: Two Very Different Architectures Before diving in, it's worth understanding that not all Nemotron models are built the same way. Nemotron-3-Nano-30B is a hybrid SSM (State Space Model) — it combines Mamba2 layers with attent

Continue reading on Dev.to

Opens in a new tab

Read Full Article

2 views

Running NVIDIA Nemotron on a Mac with Docker Model Runner: What You Need to Know

Related Articles

Deep dive — Building a local physics-informed ML workflow for fluid simulations

Stop Struggling with PDFs in Flutter — Here’s Everything You Need to Know

Statistical Edge: How to Know If Your Strategy Actually Works

Vibe Coding: When Software Became A Conversation, Not Code

How I Won the MTD Marathon 2026 — Building a Personal Diary App in Just 4 Hours