Back to articles
Running NVIDIA Nemotron on a Mac with Docker Model Runner: What You Need to Know
How-ToSystems

Running NVIDIA Nemotron on a Mac with Docker Model Runner: What You Need to Know

via Dev.toAjeet Singh Raina

The NVIDIA Nemotron family is one of the most exciting model releases in recent months - efficient, capable, and increasingly accessible. With Docker Model Runner, pulling and running LLMs locally is as simple as pulling a container image. So naturally, I wanted to see how far I could push Nemotron on my Mac. Here's what I learned. How Docker Model Runner Works on Mac On macOS, Docker Model Runner uses the vllm-metal backend - a Metal-accelerated inference stack built on top of mlx-lm for Apple Silicon. This means models run natively on your Mac's unified memory using the GPU, with no cloud dependency and no Docker Linux VM involved in inference. The command to run any HuggingFace model is clean and familiar: docker model run hf.co/<model-name> The Nemotron Family: Two Very Different Architectures Before diving in, it's worth understanding that not all Nemotron models are built the same way. Nemotron-3-Nano-30B is a hybrid SSM (State Space Model) — it combines Mamba2 layers with attent

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles