
I Tried Running LLMs on Intel's NPU. Here's What Actually Happened.
A hands-on guide to local LLM inference on a Lenovo ThinkPad T14 Gen 5 with Intel Core Ultra 7 155U, comparing NPU, CPU, and llama.cpp performance. The Promise Intel's "AI Boost" NPU (Neural Processing Unit) ships in every Core Ultra laptop. The marketing suggests it's your on-device AI accelerator, ready to run models locally. I wanted to test that claim by running LLMs on my ThinkPad T14 Gen 5. What followed was a journey through compiler errors, dynamic shape limitations, and some surprising benchmark results. My Hardware Laptop: Lenovo ThinkPad T14 Gen 5 CPU: Intel Core Ultra 7 155U (Meteor Lake), 12 cores, 14 logical processors RAM: 32 GB DDR5 NPU: Intel AI Boost (NPU 3720), ~10-11 TOPS, 18 GB shared memory GPU: Intel integrated graphics OS: Windows 11 NPU Driver: 32.0.100.4512 (December 2025) Attempt 1: The Obvious Approach (OVModelForCausalLM) The most documented way to run a model on Intel hardware is through optimum-intel and OpenVINO. I exported Qwen2.5-7B-Instruct to OpenVIN
Continue reading on Dev.to
Opens in a new tab



