
Shogi AI with RTX 5090 — Record of TensorRT FP8 Quantization and Floodgate Practical Games
What is dlshogi? dlshogi is a Shogi engine incorporating deep learning, consisting of a C++ implementation and a Python wrapper. It operates with ONNX Runtime and DirectML on Windows, and TensorRT and CUDA on Linux. In this project, we implemented it leveraging TensorRT in a WSL2 Ubuntu 24.04 environment equipped with an RTX 5090. Key features are as follows: Evaluation value generation by neural network Hybrid approach with traditional αβ search Coordinated control of multiple engines via wrapper scripts Fuka40B Model Architecture Fuka40B is a model adopting the ResNet40x384 architecture with 107.2M parameters. It uses the ReLU activation function and is designed to accurately evaluate Shogi board positions. Training Data: Distilled dataset Optimization Algorithm: AdamW (learning rate 0.00005, weight decay 0.01) Batch Size: 4096 Effects of FP8 Quantization TensorRT's FP8 quantization is a technique that improves inference speed while reducing VRAM load. Compared to INT4, FP8 has small
Continue reading on Dev.to
Opens in a new tab




