{"slug": "deltatensors-store-model-fine-tunes-as-compressed-weight-deltas", "title": "Deltatensors – store model fine-tunes as compressed weight deltas", "summary": "Deltatensors, a new open-source tool, compresses fine-tuned neural network model deltas into small .wdelta files, achieving near-lossless compression with sub-1% perplexity difference. Tested on Qwen2.5-0.5B fine-tuned on WikiText-2, it reduces storage by 3.2x per delta and ~2.8x across 10 fine-tunes, enabling efficient storage of multiple fine-tuned models from a single base.", "body_md": "**Near-lossless delta compression for fine-tuned neural network models.**\n\nInstead of storing 50 fine-tunes of the same base model, store one base and 50 small `.wdelta`\n\ndelta files. `deltatensors`\n\ncompresses the delta between a base and fine-tuned model, and reconstructs with sub-1% perplexity difference.\n\n**Tested on Qwen2.5-0.5B fine-tuned on WikiText-2:**\n\n- Perplexity: 19.11 (original) → 19.22 (reconstructed) — 0.58% perplexity difference\n- Less degradation than standard int4 quantization of the full model\n- 294 MB delta vs 953 MB fine-tuned model (3.2x)\n- ~2.8x total storage reduction across 10 fine-tunes\n\n```\nbase_model.safetensors   1.0 GB\ncheckpoint_01.wdelta     294 MB\ncheckpoint_02.wdelta     294 MB\n...\ncheckpoint_10.wdelta     294 MB\n─────────────────────────────────\nTotal                    3.9 GB    vs  11 GB naive\npip install deltatensors\npip install torch safetensors  # for loading from safetensors directories\npython\nimport deltatensors as dt\n\n# save delta between a fine-tuned and base model (streaming, O(1) RAM)\ndt.save_delta_from_paths(\"checkpoint.wdelta\", \"qwen-wiki/\", \"qwen-base/\", strategy=\"int4\")\n\n# reconstruct without loading the full base into RAM\nrecon_sd = dt.load_delta_from_paths(\"checkpoint.wdelta\", \"qwen-base/\")\n\n# inspect a delta file without a base model\ninfo = dt.inspect(\"checkpoint.wdelta\")\nprint(info)\n# {'path': 'checkpoint.wdelta', 'size_mb': 294.2, 'strategy': 'int4', 'n_tensors': 290, ...}\n```\n\n| Strategy | Quality | Compression |\n|---|---|---|\n`int4` |\nnear-lossless (~0.5% PPL) | best |\n`sparse` |\ntunable via `sparsity=` |\ngood |\n`quantized` |\nBitDelta-style 1-bit | aggressive |\n\n`int4`\n\nuses outlier extraction (top k% weights stored in float16) + 4-bit quantization for the remainder. This was the strategy used for the example at the start.\n\nLoRA constrains the delta to be low-rank *during training*, which limits expressiveness. `deltatensors`\n\ncompresses arbitrary full fine-tune deltas *after training* - no constraints on how you fine-tune.\n\n**Lineage**— chain multiple`.wdelta`\n\nfiles to track and reconstruct full fine-tuning histories\n\nMIT\n\np.s. *If you find deltatensors useful, please consider leaving a ⭐ star on the repository to help others find it!*", "url": "https://wpnews.pro/news/deltatensors-store-model-fine-tunes-as-compressed-weight-deltas", "canonical_source": "https://github.com/AaravGaurdev/deltatensors", "published_at": "2026-06-24 04:07:19+00:00", "updated_at": "2026-06-24 04:14:11.061648+00:00", "lang": "en", "topics": ["machine-learning", "ai-tools", "ai-research"], "entities": ["Deltatensors", "Qwen2.5-0.5B", "WikiText-2", "BitDelta", "LoRA"], "alternates": {"html": "https://wpnews.pro/news/deltatensors-store-model-fine-tunes-as-compressed-weight-deltas", "markdown": "https://wpnews.pro/news/deltatensors-store-model-fine-tunes-as-compressed-weight-deltas.md", "text": "https://wpnews.pro/news/deltatensors-store-model-fine-tunes-as-compressed-weight-deltas.txt", "jsonld": "https://wpnews.pro/news/deltatensors-store-model-fine-tunes-as-compressed-weight-deltas.jsonld"}}