Caching LLM responses is just content addressing

CommitBrief caches LLM reviews using content-addressed hashing, where each cache key is a SHA-256 hash of the diff, system prompt, provider, model, language, and schema version. This design ensures that cache hits are provably identical to the original review, and invalidating stale entries requires no explicit logic—changing any input automatically produces a new key. The cache is stored as JSON files on disk, and lookups skip cost estimation entirely, making re-runs on unchanged diffs instant and free.

An LLM review costs money and a few seconds of latency. Reviewing the same diff twice should cost neither. CommitBrief caches every review, but the interesting part isn't that it caches — it's that the cache is content-addressed, so a hit is provably the same review, and there is no such thing as a stale one. Editing a single line of your rules file invalidates exactly the entries it should, and not one more, with zero invalidation logic anywhere in the code. TL;DR Everything good about this cache falls out of one function. Here it is, complete: func Compute args ComputeArgs string { h := sha256.New h.Write byte args.Diff h.Write byte "::" h.Write byte args.SystemPrompt h.Write byte "::" h.Write byte args.Provider h.Write byte ":" h.Write byte args.Model h.Write byte ":" h.Write byte args.Lang h.Write byte ":" h.Write byte strconv.Itoa SchemaVersion if args.WithContext { h.Write byte ":ctx" } if args.Mode = "" { h.Write byte ":mode:" + args.Mode } return hex.EncodeToString h.Sum nil } Each input is in the key because each one can change the output: COMMITBRIEF.md rules, the severity rubric, the response-format contract, and any architecture constraints. This is the load-bearing one for invalidation, below. 1 . Bump it and The two trailing markers are a lesson in not breaking your own cache. :ctx and :mode: are appended only when set . A plain review writes neither, so its key is byte-identical to what the same review produced three versions ago — adding the --with-context feature and the commit mode didn't invalidate anybody's existing cache. New behavior gets new key-space; unchanged behavior keeps its old keys. That discipline is why upgrades don't silently nuke everyone's cache on the first run. Lookup is a file read, an unmarshal, and two guards: func c Cache Get key string Entry, bool { path := c.entryPath key data, err := os.ReadFile path if err = nil { return Entry{}, false } var e Entry if err := json.Unmarshal data, &e ; err = nil { = os.Remove path // corrupt entry: drop it, next write replaces return Entry{}, false } if e.Version = SchemaVersion { return Entry{}, false } if e.ExpiredAt c.now { return Entry{}, false } return e, true } No network, no tokens. And because the lookup happens before the cost preflight in the pipeline, a hit skips the cost estimate altogether — there's nothing to estimate when you're not calling anyone. On an unchanged diff, a re-run is effectively instant and free. This is the payoff. There is no invalidateCacheAfterEditingRules anywhere in the codebase, because it would be dead code. The system prompt is in the key, and your rules are in the system prompt. So the moment you change one line of COMMITBRIEF.md , the assembled prompt's bytes change, its SHA-256 changes, and the old entry's key is one nobody will ever compute again. The stale review isn't deleted — it's unreachable, and the next review writes a fresh entry under the new key. Content addressing means a cache hit is, by construction, a review produced from byte-identical inputs. There's no heuristic deciding whether a cached answer is "still valid," because validity isn't a question you can ask of a content-addressed store — the inputs either hash to the same key or they don't. A cache entry is one JSON file per response: type Entry struct { Version int json:"version" CreatedAt time.Time json:"created at" TTL int64 json:"ttl" Key KeyMeta json:"key" Result Result json:"result" } Result carries a Format marker — json , markdown-fallback , or plain-text — so a degraded review post 3 https://dev.to/muhammetsafak/getting-structured-json-out-of-five-incompatible-llm-apis-and-degrading-when-they-ignore-you-27jg or a CLI provider's pre-formatted output replays down exactly the right renderer path, with no warning re-emitted on a cache hit. Writes are atomic: serialize to a temp file, then rename into place. tmp := path + ".tmp" if err := os.WriteFile tmp, data, 0o600 ; err = nil { return err } return os.Rename tmp, path os.Rename is atomic on a POSIX filesystem, so a crash mid-write leaves a .tmp file, never a half-written entry that would later unmarshal into garbage. Mode 0600 keeps the cached review readable only by you. And the first successful write appends .commitbrief/ to the repo's .gitignore , so your cache never lands in a commit. Left alone, the cache grows. Two mechanisms keep it in check. If cache.max size mb is set, an eviction sweep runs after each write — oldest-first by CreatedAt file mtime as fallback — until the total fits, and the just-written entry is always protected, so a single review larger than the budget still survives the write that created it. Entries also carry a TTL, defaulting to seven days. And you can prune by hand: commitbrief cache stats count, size, age, per-provider commitbrief cache prune --keep-last 500 --older-than 7d commitbrief cache inspect <key --show-content one entry's metadata + body prune keeps an entry only if it's inside both windows — among the newest 500 and younger than seven days. When CommitBrief does call a provider, a cost preflight runs first: it estimates input tokens at roughly four characters each, guesses output conservatively floored at 200 tokens, capped at 1500 — a structured review rarely runs longer , multiplies by the model's price table, and prompts only if the estimate clears your threshold cost.warn threshold usd , default $0.50 . A cache hit skips that whole machine. On a paid provider, the second review of an unchanged diff costs literally nothing; on a local Ollama model post 5 https://dev.to/muhammetsafak/air-gapped-code-review-with-ollama-when-the-diff-never-leaves-the-machine-4kb8 it was already free, but the cache still saves you the inference seconds. A cache hit replays the first answer verbatim — including its mistakes. The cache makes a re-run free; it does not make it better . If the model missed something the first time, the cached entry will keep missing it until the inputs change or you force a fresh call with --no-cache . And the store is deliberately repo-local: .commitbrief/cache/ on your machine, never a shared team server, because there isn't one — the same local-first stance that runs through everything else. The cache saves you tokens and time; it doesn't pretend to be a source of truth. Repo: github.com/CommitBrief/commitbrief . Part 6 of Building CommitBrief . Next: exposing the whole review pipeline as a Model Context Protocol tool — JSON-RPC over stdio, in standard-library Go, with zero new dependencies.