I Started Building a Roguelike RPG — Powered by On-Device AI #3

QNN Failed. LiteRT Failed. Then llama.cpp Delivered 42x Speedup. I wanted to write a success story today. It turns out I can. But getting there was a bit rough. What I Tried Today Attempt Result QNN HTP + libcdsprpc.so workaround HTP initialized, but only 3 of 363 nodes ran on NPU LiteRT-LM GPU GPU memory overflow / engine creation failed llama.cpp + Adreno OpenCL Success. 8.9 tok/s QNN HTP: 3 Out of 363 Nodes I solved the libcdsprpc.so access problem from yesterday. The fix was using apktool to decompile the APK, inject uses-native-library directly into the manifest, and repackage. Not elegant, but it worked. HTP finally initialized: QnnDsp <W> Initializing HtpProvider ✅ QnnDsp <W> PrepareLibLoader Loading libQnnHtpPrepare.so ✅ Then this log appeared: number of nodes in the graph: 363 number of nodes supported by QNN: 3 3 out of 363 nodes ran on the NPU. The INT4 block quantization operator (MatMulNBits) isn't supported by HTP. The remaining 360 nodes fell back to CPU. Generation time

I Started Building a Roguelike RPG — Powered by On-Device AI #3

Related Articles

Building DNS query tool from scratch using C

How to build .NET obfuscator - Part I

How to Use Traceroute and MTR to Diagnose Network Issues

apt-key Deprecation: Add Repositories with GPG on Ubuntu

How To Use Variadic Functions in Go

Related Articles

How-To
Building DNS query tool from scratch using C
Reddit Programming • 1d ago

How-To
How to build .NET obfuscator - Part I
Reddit Programming • 2d ago

How-To
How to Use Traceroute and MTR to Diagnose Network Issues
DigitalOcean Tutorials • 1w ago

How-To
apt-key Deprecation: Add Repositories with GPG on Ubuntu
DigitalOcean Tutorials • 1w ago

How-To
How To Use Variadic Functions in Go
DigitalOcean Tutorials • 2w ago