Gemma 4 bug fixes and Research Request

A critical bug in Google's Gemma 4 causes it to malform tool calls under real load, affecting vLLM, llama.cpp, Ollama, and oobabooga. A developer open-sourced a diagnosis, repair, and experimental LoRA that reduces but doesn't eliminate the issue, calling for community help with richer data.

Gemma-4 has an ecosystem-wide agentic bug: under real load long context + reasoning + generating content it malforms its own tool calls and then loops on the broken output — reported across vLLM, llama.cpp, Ollama, and oobabooga, and unfixed by Google. I open-sourced a full diagnosis + a reproduction harness + a working parser-level repair + a stock NVFP4 quant & recipe + an experimental format LoRA — with all raw data. Honest findings: the LoRA reduces the slip I ruled out quantization as the cause via a BF16 control but doesn’t fully eliminate it yet — it overfit my synthetic data, so a bigger/more-diverse dataset is likely the real fix. Everything’s here, Built with Gemma — looking for help pushing the LoRA further with richer data