Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

NVIDIA Nemotron-Nano-9B-v2-Japanese This is a 9B parameter LLM specialized for Japanese, released by NVIDIA. It is based on the Mamba SSM (State Space Model) architecture, which efficiently processes long texts using an approach different from Transformers. It also supports Thinking mode (enable_thinking=True), allowing explicit output of the inference process. Environment OS: Ubuntu (WSL2) GPU: RTX 5090 (VRAM 32GB) Python: 3.13 Package Manager: uv Environment Setup Dependencies are managed using uv 's pyproject.toml . For causal_conv1d and mamba_ssm , pre-built .whl files from their GitHub release pages are specified. [ project] name = "nemotron" version = "0.1.0" requires-python = "==3.13.*" dependencies = [ "accelerate==1.12.0" , "causal_conv1d" , "hf-xet==1.2.0" , "mamba_ssm" , "torch==2.7.1+cu128" , "transformers==4.48.3" , "triton==3.3.1" ] [[ tool.uv.index]] name = "torch-cuda" url = "https://download.pytorch.org/whl/cu128" explicit = true [ tool.uv.sources] torch = [{ index = "

Running NVIDIA Nemotron-Nano-9B-v2-Japanese Locally: Mamba SSM + Thinking Mode Support

Related Articles

The Real Cost of Abstractions in .NET

Stop Learning Frameworks — You’re Wasting Your Time

How to Self-Host n8n in 2026: VPS vs Managed Hosting (Full Comparison)

I Built a Mac App to Fix Android File Transfer — Here’s What I Learned

What I learned about X-HEEP by Benchmarking

Related Articles

How-To
The Real Cost of Abstractions in .NET
Medium Programming • 2d ago

How-To
Stop Learning Frameworks — You’re Wasting Your Time
Medium Programming • 2d ago

How-To
How to Self-Host n8n in 2026: VPS vs Managed Hosting (Full Comparison)
Dev.to • 2d ago

How-To
I Built a Mac App to Fix Android File Transfer — Here’s What I Learned
Medium Programming • 2d ago

How-To
What I learned about X-HEEP by Benchmarking
Medium Programming • 2d ago