
Gemma4 Tool Calling Fixes in llama.cpp, RTX cuBLAS MatMul Bug, & Local Ollama + Whisper UI
Gemma4 Tool Calling Fixes in llama.cpp, RTX cuBLAS MatMul Bug, & Local Ollama + Whisper UI Today's Highlights This week features significant technical updates for local AI, including critical fixes for Gemma4's tool calling in llama.cpp, a deep dive into a major cuBLAS performance bug affecting RTX GPUs, and a new local-first UI integrating Whisper and Ollama for multimodal tasks. More Gemma4 fixes in the past 24 hours (r/LocalLLaMA) Source: https://reddit.com/r/LocalLLaMA/comments/1shs6sx/more_gemma4_fixes_in_the_past_24_hours/ Recent updates to llama.cpp address key issues affecting the Gemma4 model, particularly related to tool calling and reasoning capabilities. A significant 'reasoning budget fix' has been merged into the ggml-org/llama.cpp repository, indicated by pull request #21697. This fix is crucial for improving Gemma4's ability to process and generate logical responses, particularly in complex tasks. In addition to the reasoning budget, Google has released new chat templat
Continue reading on Dev.to
Opens in a new tab
