Gemma-4 has an ecosystem-wide agentic bug: under real load (long context + reasoning + generating
content) it malforms its own tool calls and then loops on the broken output — reported across vLLM,
llama.cpp, Ollama, and oobabooga, and unfixed by Google.
I open-sourced a full diagnosis + a reproduction harness + a working parser-level repair + a stock
NVFP4 quant & recipe + an experimental format LoRA — with all raw data. Honest findings: the LoRA
reduces the slip (I ruled out quantization as the cause via a BF16 control) but doesn’t fully
eliminate it yet — it overfit my synthetic data, so a bigger/more-diverse dataset is likely the real fix.
Everything’s here, Built with Gemma — looking for help pushing the LoRA further with richer data