Gemma4 Tool Calling Fixes in llama.cpp, RTX cuBLAS MatMul Bug, & Local Ollama + Whisper UI

Gemma4 Tool Calling Fixes in llama.cpp, RTX cuBLAS MatMul Bug, & Local Ollama + Whisper UI Today's Highlights This week features significant technical updates for local AI, including critical fixes for Gemma4's tool calling in llama.cpp, a deep dive into a major cuBLAS performance bug affecting RTX GPUs, and a new local-first UI integrating Whisper and Ollama for multimodal tasks. More Gemma4 fixes in the past 24 hours (r/LocalLLaMA) Source: https://reddit.com/r/LocalLLaMA/comments/1shs6sx/more_gemma4_fixes_in_the_past_24_hours/ Recent updates to llama.cpp address key issues affecting the Gemma4 model, particularly related to tool calling and reasoning capabilities. A significant 'reasoning budget fix' has been merged into the ggml-org/llama.cpp repository, indicated by pull request #21697. This fix is crucial for improving Gemma4's ability to process and generate logical responses, particularly in complex tasks. In addition to the reasoning budget, Google has released new chat templat

Gemma4 Tool Calling Fixes in llama.cpp, RTX cuBLAS MatMul Bug, & Local Ollama + Whisper UI

Related Articles

Floating point from scratch: Hard Mode

Using XSLT to analyse large XML datasets

Put your SSH keys in your TPM chip

Meet Kiki - an array language

Ursa - a new Iceberg-first storage engine for Kafka

Related Articles

News
Floating point from scratch: Hard Mode
Reddit Programming • 6h ago

News
Using XSLT to analyse large XML datasets
Reddit Programming • 8h ago

News
Put your SSH keys in your TPM chip
Lobsters • 9h ago

News
Meet Kiki - an array language
Lobsters • 9h ago

News
Ursa - a new Iceberg-first storage engine for Kafka
Lobsters • 10h ago