{"slug": "caching-llm-responses-is-just-content-addressing", "title": "Caching LLM responses is just content addressing", "summary": "CommitBrief caches LLM reviews using content-addressed hashing, where each cache key is a SHA-256 hash of the diff, system prompt, provider, model, language, and schema version. This design ensures that cache hits are provably identical to the original review, and invalidating stale entries requires no explicit logic—changing any input automatically produces a new key. The cache is stored as JSON files on disk, and lookups skip cost estimation entirely, making re-runs on unchanged diffs instant and free.", "body_md": "**An LLM review costs money and a few seconds of latency. Reviewing the same diff twice should cost neither.** CommitBrief caches every review, but the interesting part isn't *that* it caches — it's that the cache is content-addressed, so a hit is provably the same review, and there is no such thing as a stale one. Editing a single line of your rules file invalidates exactly the entries it should, and not one more, with zero invalidation logic anywhere in the code.\n\n**TL;DR**\n\nEverything good about this cache falls out of one function. Here it is, complete:\n\n```\nfunc Compute(args ComputeArgs) string {\n    h := sha256.New()\n    h.Write([]byte(args.Diff))\n    h.Write([]byte(\"::\"))\n    h.Write([]byte(args.SystemPrompt))\n    h.Write([]byte(\"::\"))\n    h.Write([]byte(args.Provider))\n    h.Write([]byte(\":\"))\n    h.Write([]byte(args.Model))\n    h.Write([]byte(\":\"))\n    h.Write([]byte(args.Lang))\n    h.Write([]byte(\":\"))\n    h.Write([]byte(strconv.Itoa(SchemaVersion)))\n    if args.WithContext {\n        h.Write([]byte(\":ctx\"))\n    }\n    if args.Mode != \"\" {\n        h.Write([]byte(\":mode:\" + args.Mode))\n    }\n    return hex.EncodeToString(h.Sum(nil))\n}\n```\n\nEach input is in the key because each one can change the output:\n\n`COMMITBRIEF.md`\n\nrules, the severity rubric, the response-format contract, and any architecture constraints. This is the load-bearing one for invalidation, below.`1`\n\n. Bump it and The two trailing markers are a lesson in not breaking your own cache. `:ctx`\n\nand `:mode:`\n\nare appended *only when set*. A plain review writes neither, so its key is byte-identical to what the same review produced three versions ago — adding the `--with-context`\n\nfeature and the `commit`\n\nmode didn't invalidate anybody's existing cache. New behavior gets new key-space; unchanged behavior keeps its old keys. That discipline is why upgrades don't silently nuke everyone's cache on the first run.\n\nLookup is a file read, an unmarshal, and two guards:\n\n```\nfunc (c *Cache) Get(key string) (Entry, bool) {\n    path := c.entryPath(key)\n    data, err := os.ReadFile(path)\n    if err != nil {\n        return Entry{}, false\n    }\n    var e Entry\n    if err := json.Unmarshal(data, &e); err != nil {\n        _ = os.Remove(path) // corrupt entry: drop it, next write replaces\n        return Entry{}, false\n    }\n    if e.Version != SchemaVersion {\n        return Entry{}, false\n    }\n    if e.ExpiredAt(c.now()) {\n        return Entry{}, false\n    }\n    return e, true\n}\n```\n\nNo network, no tokens. And because the lookup happens *before* the cost preflight in the pipeline, a hit skips the cost estimate altogether — there's nothing to estimate when you're not calling anyone. On an unchanged diff, a re-run is effectively instant and free.\n\nThis is the payoff. There is no `invalidateCacheAfterEditingRules()`\n\nanywhere in the codebase, because it would be dead code. The system prompt is *in* the key, and your rules are *in* the system prompt. So the moment you change one line of `COMMITBRIEF.md`\n\n, the assembled prompt's bytes change, its SHA-256 changes, and the old entry's key is one nobody will ever compute again. The stale review isn't deleted — it's unreachable, and the next review writes a fresh entry under the new key.\n\nContent addressing means a cache hit is, by construction, a review produced from byte-identical inputs. There's no heuristic deciding whether a cached answer is \"still valid,\" because validity isn't a question you can ask of a content-addressed store — the inputs either hash to the same key or they don't.\n\nA cache entry is one JSON file per response:\n\n```\ntype Entry struct {\n    Version   int       `json:\"version\"`\n    CreatedAt time.Time `json:\"created_at\"`\n    TTL       int64     `json:\"ttl\"`\n    Key       KeyMeta   `json:\"key\"`\n    Result    Result    `json:\"result\"`\n}\n```\n\n`Result`\n\ncarries a `Format`\n\nmarker — `json`\n\n, `markdown-fallback`\n\n, or `plain-text`\n\n— so a degraded review ([post 3](https://dev.to/muhammetsafak/getting-structured-json-out-of-five-incompatible-llm-apis-and-degrading-when-they-ignore-you-27jg)) or a CLI provider's pre-formatted output replays down exactly the right renderer path, with no warning re-emitted on a cache hit. Writes are atomic: serialize to a temp file, then rename into place.\n\n```\ntmp := path + \".tmp\"\nif err := os.WriteFile(tmp, data, 0o600); err != nil {\n    return err\n}\nreturn os.Rename(tmp, path)\n```\n\n`os.Rename`\n\nis atomic on a POSIX filesystem, so a crash mid-write leaves a `.tmp`\n\nfile, never a half-written entry that would later unmarshal into garbage. Mode `0600`\n\nkeeps the cached review readable only by you. And the first successful write appends `.commitbrief/`\n\nto the repo's `.gitignore`\n\n, so your cache never lands in a commit.\n\nLeft alone, the cache grows. Two mechanisms keep it in check. If `cache.max_size_mb`\n\nis set, an eviction sweep runs *after* each write — oldest-first by `CreatedAt`\n\n(file mtime as fallback) — until the total fits, and the just-written entry is always protected, so a single review larger than the budget still survives the write that created it. Entries also carry a TTL, defaulting to seven days. And you can prune by hand:\n\n```\ncommitbrief cache stats                          # count, size, age, per-provider\ncommitbrief cache prune --keep-last 500 --older-than 7d\ncommitbrief cache inspect <key> --show-content   # one entry's metadata + body\n```\n\n`prune`\n\nkeeps an entry only if it's inside *both* windows — among the newest 500 *and* younger than seven days.\n\nWhen CommitBrief does call a provider, a cost preflight runs first: it estimates input tokens at roughly four characters each, guesses output conservatively (floored at 200 tokens, capped at 1500 — a structured review rarely runs longer), multiplies by the model's price table, and prompts only if the estimate clears your threshold (`cost.warn_threshold_usd`\n\n, default `$0.50`\n\n). A cache hit skips that whole machine. On a paid provider, the second review of an unchanged diff costs literally nothing; on a local Ollama model ([post 5](https://dev.to/muhammetsafak/air-gapped-code-review-with-ollama-when-the-diff-never-leaves-the-machine-4kb8)) it was already free, but the cache still saves you the inference seconds.\n\nA cache hit replays the first answer verbatim — including its mistakes. The cache makes a re-run free; it does not make it *better*. If the model missed something the first time, the cached entry will keep missing it until the inputs change or you force a fresh call with `--no-cache`\n\n. And the store is deliberately repo-local: `.commitbrief/cache/`\n\non your machine, never a shared team server, because there isn't one — the same local-first stance that runs through everything else. The cache saves you tokens and time; it doesn't pretend to be a source of truth.\n\nRepo: **github.com/CommitBrief/commitbrief**.\n\n*Part 6 of **Building CommitBrief**. Next: exposing the whole review pipeline as a Model Context Protocol tool — JSON-RPC over stdio, in standard-library Go, with zero new dependencies.*", "url": "https://wpnews.pro/news/caching-llm-responses-is-just-content-addressing", "canonical_source": "https://dev.to/muhammetsafak/caching-llm-responses-is-just-content-addressing-2102", "published_at": "2026-06-29 00:00:00+00:00", "updated_at": "2026-06-29 00:27:42.689626+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "ai-tools", "ai-infrastructure"], "entities": ["CommitBrief", "SHA-256", "JSON"], "alternates": {"html": "https://wpnews.pro/news/caching-llm-responses-is-just-content-addressing", "markdown": "https://wpnews.pro/news/caching-llm-responses-is-just-content-addressing.md", "text": "https://wpnews.pro/news/caching-llm-responses-is-just-content-addressing.txt", "jsonld": "https://wpnews.pro/news/caching-llm-responses-is-just-content-addressing.jsonld"}}