
I Started Building a Roguelike RPG — Powered by On-Device AI #3
QNN Failed. LiteRT Failed. Then llama.cpp Delivered 42x Speedup. I wanted to write a success story today. It turns out I can. But getting there was a bit rough. What I Tried Today Attempt Result QNN HTP + libcdsprpc.so workaround HTP initialized, but only 3 of 363 nodes ran on NPU LiteRT-LM GPU GPU memory overflow / engine creation failed llama.cpp + Adreno OpenCL Success. 8.9 tok/s QNN HTP: 3 Out of 363 Nodes I solved the libcdsprpc.so access problem from yesterday. The fix was using apktool to decompile the APK, inject uses-native-library directly into the manifest, and repackage. Not elegant, but it worked. HTP finally initialized: QnnDsp <W> Initializing HtpProvider ✅ QnnDsp <W> PrepareLibLoader Loading libQnnHtpPrepare.so ✅ Then this log appeared: number of nodes in the graph: 363 number of nodes supported by QNN: 3 3 out of 363 nodes ran on the NPU. The INT4 block quantization operator (MatMulNBits) isn't supported by HTP. The remaining 360 nodes fell back to CPU. Generation time
Continue reading on Dev.to
Opens in a new tab

.png&w=1200&q=75)