cd /news/machine-learning/deltatensors-store-model-fine-tunes-… · home topics machine-learning article
[ARTICLE · art-37177] src=github.com ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Deltatensors – store model fine-tunes as compressed weight deltas

Deltatensors, a new open-source tool, compresses fine-tuned neural network model deltas into small .wdelta files, achieving near-lossless compression with sub-1% perplexity difference. Tested on Qwen2.5-0.5B fine-tuned on WikiText-2, it reduces storage by 3.2x per delta and ~2.8x across 10 fine-tunes, enabling efficient storage of multiple fine-tuned models from a single base.

read1 min views6 publishedJun 24, 2026
Deltatensors – store model fine-tunes as compressed weight deltas
Image: source

Near-lossless delta compression for fine-tuned neural network models.

Instead of storing 50 fine-tunes of the same base model, store one base and 50 small .wdelta

delta files. deltatensors

compresses the delta between a base and fine-tuned model, and reconstructs with sub-1% perplexity difference.

Tested on Qwen2.5-0.5B fine-tuned on WikiText-2:

  • Perplexity: 19.11 (original) → 19.22 (reconstructed) — 0.58% perplexity difference
  • Less degradation than standard int4 quantization of the full model
  • 294 MB delta vs 953 MB fine-tuned model (3.2x)
  • ~2.8x total storage reduction across 10 fine-tunes
base_model.safetensors   1.0 GB
checkpoint_01.wdelta     294 MB
checkpoint_02.wdelta     294 MB
...
checkpoint_10.wdelta     294 MB
─────────────────────────────────
Total                    3.9 GB    vs  11 GB naive
pip install deltatensors
pip install torch safetensors  # for  from safetensors directories
python
import deltatensors as dt

dt.save_delta_from_paths("checkpoint.wdelta", "qwen-wiki/", "qwen-base/", strategy="int4")

recon_sd = dt.load_delta_from_paths("checkpoint.wdelta", "qwen-base/")

info = dt.inspect("checkpoint.wdelta")
print(info)
Strategy Quality Compression
int4
near-lossless (~0.5% PPL) best
sparse
tunable via sparsity=
good
quantized
BitDelta-style 1-bit aggressive

int4

uses outlier extraction (top k% weights stored in float16) + 4-bit quantization for the remainder. This was the strategy used for the example at the start.

LoRA constrains the delta to be low-rank during training, which limits expressiveness. deltatensors

compresses arbitrary full fine-tune deltas after training - no constraints on how you fine-tune.

Lineage— chain multiple.wdelta

files to track and reconstruct full fine-tuning histories

MIT

p.s. If you find deltatensors useful, please consider leaving a ⭐ star on the repository to help others find it!

── more in #machine-learning 4 stories · sorted by recency
── more on @deltatensors 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/deltatensors-store-m…] indexed:0 read:1min 2026-06-24 ·