What's in a GGUF, besides the weights - and what's still missing? The GGUF file format consolidates all necessary model components—including weights, chat templates, and special tokens—into a single file, offering a more ergonomic alternative to the scattered JSON files typical of safetensors repos or the layered OCI structure used by Ollama. However, the format still relies on external Jinja2 template interpreters to handle complex conversational features like reasoning blocks, tool calls, and multimedia messages, with performance varying across implementations such as llama.cpp's custom Jinja engine and minijinja. The absence of a standardized, high-performance template execution system within GGUF itself remains a notable gap for local LLM applications. What's in a GGUF, besides the weights - and what's still missing? GGUF is the file format that llama.cpp https://github.com/ggml-org/llama.cpp uses for language models. The really neat thing about GGUF is that it's just one file. Compare this to a typical safetensors repo on huggingface https://huggingface.co/Qwen/Qwen3.5-0.8B/tree/main , where there's a pile of necessary JSON files scattered around - or to a typical ollama model https://ollama.com/library/qwen3.5:0.8b , which is an OCI with layers json, go templates, etc inside. The contents are roughly the same, but GGUF makes it more ergonomic by keeping all this stuff in a single file. But what is this stuff , and does it cover everything needed? Chat Templates Conversational language models are trained on sequences that follow a specific format, that sort of look like a conversation. For instance, Gemma4's format looks like this: <|turn user Hi there