
Local LLM Inference on Windows 11 and AMD GPU using WSL and llama.cpp
Part 1: Config GPU: AMD Radeon RX 7800 XT Driver Version: 25.30.27.02-260217a-198634C-AMD-Software-Adrenalin-Edition llama.cpp SHA: ecd99d6a9acbc436bad085783bcd5d0b9ae9e9e9 OS: Windows 11 (10.0.26200 Build 26200) Ubuntu version: 24.04 Need to consult ROCm compatibility matrix (linked in Part 4) to ensure valid ROCm version, GPU, GFX driver and Ubuntu version. Part 2: CPU Inference Baseline Setup WSL and Ubuntu VM: wsl --install -d Ubuntu-24.04 Launch "Ubuntu" from windows start menu Grab some utilities sudo apt update sudo apt install -y git build-essential cmake curl Clone llama.cpp git clone https://github.com/ggml-org/llama.cpp cd llama.cpp ecd99d6a9acb was the latest commit at time of writing, you can do git checkout for max reproducibility. Grab the model cd models curl -L -o mistral.gguf \ https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf cd .. Build llama.cpp cmake -B build cmake --build build --config Release Do CPU
Continue reading on Dev.to Tutorial
Opens in a new tab




