Back to articles
Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

via Dev.tosoy

NVIDIA Nemotron-Nano-9B-v2-Japanese This is a 9B parameter LLM specialized for Japanese, released by NVIDIA. It is based on the Mamba SSM (State Space Model) architecture, which efficiently processes long texts using an approach different from Transformers. It also supports Thinking mode (enable_thinking=True), allowing explicit output of the inference process. Environment OS: Ubuntu (WSL2) GPU: RTX 5090 (VRAM 32GB) Python: 3.13 Package Manager: uv Environment Setup Dependencies are managed using uv 's pyproject.toml . For causal_conv1d and mamba_ssm , pre-built .whl files from their GitHub release pages are specified. [ project] name = "nemotron" version = "0.1.0" requires-python = "==3.13.*" dependencies = [ "accelerate==1.12.0" , "causal_conv1d" , "hf-xet==1.2.0" , "mamba_ssm" , "torch==2.7.1+cu128" , "transformers==4.48.3" , "triton==3.3.1" ] [[ tool.uv.index]] name = "torch-cuda" url = "https://download.pytorch.org/whl/cu128" explicit = true [ tool.uv.sources] torch = [{ index = "

Continue reading on Dev.to

Opens in a new tab

Read Full Article
28 views

Related Articles