I Tried Running LLMs on Intel's NPU. Here's What Actually Happened.

A hands-on guide to local LLM inference on a Lenovo ThinkPad T14 Gen 5 with Intel Core Ultra 7 155U, comparing NPU, CPU, and llama.cpp performance. The Promise Intel's "AI Boost" NPU (Neural Processing Unit) ships in every Core Ultra laptop. The marketing suggests it's your on-device AI accelerator, ready to run models locally. I wanted to test that claim by running LLMs on my ThinkPad T14 Gen 5. What followed was a journey through compiler errors, dynamic shape limitations, and some surprising benchmark results. My Hardware Laptop: Lenovo ThinkPad T14 Gen 5 CPU: Intel Core Ultra 7 155U (Meteor Lake), 12 cores, 14 logical processors RAM: 32 GB DDR5 NPU: Intel AI Boost (NPU 3720), ~10-11 TOPS, 18 GB shared memory GPU: Intel integrated graphics OS: Windows 11 NPU Driver: 32.0.100.4512 (December 2025) Attempt 1: The Obvious Approach (OVModelForCausalLM) The most documented way to run a model on Intel hardware is through optimum-intel and OpenVINO. I exported Qwen2.5-7B-Instruct to OpenVIN

I Tried Running LLMs on Intel's NPU. Here's What Actually Happened.

Related Articles

NAS sync with lsyncd and rsync: what was not working and how I fixed it

Installing every* Firefox extension

Why XIRR Breaks When Your Angel Portfolio Hits 10+ Investments

Installing OpenBSD on the Pomera DM250{,XY?}

How To Order Query Results in Laravel Eloquent

Related Articles

How-To
NAS sync with lsyncd and rsync: what was not working and how I fixed it
Dev.to • 4h ago

How-To
Installing every* Firefox extension
Lobsters • 7h ago

How-To
Why XIRR Breaks When Your Angel Portfolio Hits 10+ Investments
Dev.to • 10h ago

How-To
Installing OpenBSD on the Pomera DM250{,XY?}
Lobsters • 14h ago

How-To
How To Order Query Results in Laravel Eloquent
DigitalOcean Tutorials • 17h ago