Transplant MTP block from one GGUF file into another A developer has released a Python script that transplants extra tensors—such as Multi-Token Prediction (MTP) layers—from one GGUF file into another, enabling the creation of mixed-quantization models. The tool preserves the exact on-disk layout, including per-row metadata critical for GPU inference, and supports using smaller "tensors-only" donor files to save bandwidth. Example usage includes transplanting an MTP block from a Q8_0 quantized file into an IQ4_KS base model. | /usr/bin/env python3 | | | """ | | | Transplant extra tensors e.g. MTP layers from one GGUF file into another, | | | producing a mixed-quantization GGUF. | | | Note: Tested with ik llama.cpp GGUF Python module. | | | Usage: | | | python convert.py