{"slug": "show-hn-thaw-git-branch-for-a-running-llm-fork-agents-skip-prefill", "title": "Show HN: Thaw – Git branch for a running LLM (fork agents, skip prefill)", "summary": "A developer released Thaw, an open-source tool that snapshots a live large language model inference session to create forked branches without re-running prefill, addressing the computational waste of forking agents. The tool, which works with vLLM and SGLang, demonstrated a 400x speed improvement on H100 hardware by preserving the KV cache during fork operations, contrasting with NVIDIA's Dynamo Snapshot approach.", "body_md": "I built thaw because forking an LLM agent is absurdly wasteful today. When an agent explores N branches — RL rollouts, best-of-N, parallel coding attempts — each branch re-runs prefill over the same shared context. You pay for the same prompt N times.\n\nthaw snapshots a *live* inference session — weights, KV cache, scheduler state, and the prefix-hash table — and hydrates N children that diverge from the fork point without re-prefilling. It's `git branch` for a running model.\n\nThe receipt (H100 80GB, Llama-3.1-8B, real hardware): a pre-warmed pool boots once in 22.3s, then each fork round of 4 branches × 64 tokens runs in 0.88s median. Cold-boot equivalent would be ~340s/round — ~400× amortized. All rounds bit-identical at the fork boundary. Full JSON receipt + reproducer in the repo, nothing hand-waved.\n\nNVIDIA shipped Dynamo Snapshot last week for fast pod cold-starts — and they free the KV cache before checkpoint, by design. thaw is the opposite bet: preserve the KV cache so a fork is near-free. Different problem, opposite mechanic.\n\npip install thaw-vllm. Works with vLLM and SGLang, Apache-2.0.\n\n[https://github.com/thaw-ai/thaw](https://github.com/thaw-ai/thaw)\n\nI'm a solo dev and this is the thing I most want feedback on: is the fork primitive the right shape, or do people want it wrapped in a framework(LangGraph/TRL) node instead? Happy to go deep on the KV-restore internals.\n\nComments URL: [https://news.ycombinator.com/item?id=48341069](https://news.ycombinator.com/item?id=48341069)\n\nPoints: 1\n\n# Comments: 0", "url": "https://wpnews.pro/news/show-hn-thaw-git-branch-for-a-running-llm-fork-agents-skip-prefill", "canonical_source": "https://github.com/thaw-ai/thaw", "published_at": "2026-05-30 22:07:26+00:00", "updated_at": "2026-05-30 22:16:05.725118+00:00", "lang": "en", "topics": ["large-language-models", "ai-infrastructure", "ai-tools", "ai-agents", "machine-learning"], "entities": ["thaw", "vLLM", "SGLang", "NVIDIA", "Dynamo Snapshot", "Llama-3.1-8B", "H100", "LangGraph"], "alternates": {"html": "https://wpnews.pro/news/show-hn-thaw-git-branch-for-a-running-llm-fork-agents-skip-prefill", "markdown": "https://wpnews.pro/news/show-hn-thaw-git-branch-for-a-running-llm-fork-agents-skip-prefill.md", "text": "https://wpnews.pro/news/show-hn-thaw-git-branch-for-a-running-llm-fork-agents-skip-prefill.txt", "jsonld": "https://wpnews.pro/news/show-hn-thaw-git-branch-for-a-running-llm-fork-agents-skip-prefill.jsonld"}}