Caching LLM responses is just content addressing

wpnews.pro

An LLM review costs money and a few seconds of latency. Reviewing the same diff twice should cost neither. CommitBrief caches every review, but the interesting part isn't that it caches — it's that the cache is content-addressed, so a hit is provably the same review, and there is no such thing as a stale one. Editing a single line of your rules file invalidates exactly the entries it should, and not one more, with zero invalidation logic anywhere in the code.

TL;DR

Everything good about this cache falls out of one function. Here it is, complete:

func Compute(args ComputeArgs) string {
    h := sha256.New()
    h.Write([]byte(args.Diff))
    h.Write([]byte("::"))
    h.Write([]byte(args.SystemPrompt))
    h.Write([]byte("::"))
    h.Write([]byte(args.Provider))
    h.Write([]byte(":"))
    h.Write([]byte(args.Model))
    h.Write([]byte(":"))
    h.Write([]byte(args.Lang))
    h.Write([]byte(":"))
    h.Write([]byte(strconv.Itoa(SchemaVersion)))
    if args.WithContext {
        h.Write([]byte(":ctx"))
    }
    if args.Mode != "" {
        h.Write([]byte(":mode:" + args.Mode))
    }
    return hex.EncodeToString(h.Sum(nil))
}

Each input is in the key because each one can change the output:

COMMITBRIEF.md

rules, the severity rubric, the response-format contract, and any architecture constraints. This is the load-bearing one for invalidation, below.1

. Bump it and The two trailing markers are a lesson in not breaking your own cache. :ctx

and :mode:

are appended only when set. A plain review writes neither, so its key is byte-identical to what the same review produced three versions ago — adding the --with-context

feature and the commit

mode didn't invalidate anybody's existing cache. New behavior gets new key-space; unchanged behavior keeps its old keys. That discipline is why upgrades don't silently nuke everyone's cache on the first run.

Lookup is a file read, an unmarshal, and two guards:

func (c *Cache) Get(key string) (Entry, bool) {
    path := c.entryPath(key)
    data, err := os.ReadFile(path)
    if err != nil {
        return Entry{}, false
    }
    var e Entry
    if err := json.Unmarshal(data, &e); err != nil {
        _ = os.Remove(path) // corrupt entry: drop it, next write replaces
        return Entry{}, false
    }
    if e.Version != SchemaVersion {
        return Entry{}, false
    }
    if e.ExpiredAt(c.now()) {
        return Entry{}, false
    }
    return e, true
}

No network, no tokens. And because the lookup happens before the cost preflight in the pipeline, a hit skips the cost estimate altogether — there's nothing to estimate when you're not calling anyone. On an unchanged diff, a re-run is effectively instant and free.

This is the payoff. There is no invalidateCacheAfterEditingRules()

anywhere in the codebase, because it would be dead code. The system prompt is in the key, and your rules are in the system prompt. So the moment you change one line of COMMITBRIEF.md

, the assembled prompt's bytes change, its SHA-256 changes, and the old entry's key is one nobody will ever compute again. The stale review isn't deleted — it's unreachable, and the next review writes a fresh entry under the new key.

Content addressing means a cache hit is, by construction, a review produced from byte-identical inputs. There's no heuristic deciding whether a cached answer is "still valid," because validity isn't a question you can ask of a content-addressed store — the inputs either hash to the same key or they don't.

A cache entry is one JSON file per response:

type Entry struct {
    Version   int       `json:"version"`
    CreatedAt time.Time `json:"created_at"`
    TTL       int64     `json:"ttl"`
    Key       KeyMeta   `json:"key"`
    Result    Result    `json:"result"`
}

Result

carries a Format

marker — json

, markdown-fallback

, or plain-text

— so a degraded review (post 3) or a CLI provider's pre-formatted output replays down exactly the right renderer path, with no warning re-emitted on a cache hit. Writes are atomic: serialize to a temp file, then rename into place.

tmp := path + ".tmp"
if err := os.WriteFile(tmp, data, 0o600); err != nil {
    return err
}
return os.Rename(tmp, path)

os.Rename

is atomic on a POSIX filesystem, so a crash mid-write leaves a .tmp

file, never a half-written entry that would later unmarshal into garbage. Mode 0600

keeps the cached review readable only by you. And the first successful write appends .commitbrief/

to the repo's .gitignore

, so your cache never lands in a commit.

Left alone, the cache grows. Two mechanisms keep it in check. If cache.max_size_mb

is set, an eviction sweep runs after each write — oldest-first by CreatedAt

(file mtime as fallback) — until the total fits, and the just-written entry is always protected, so a single review larger than the budget still survives the write that created it. Entries also carry a TTL, defaulting to seven days. And you can prune by hand:

commitbrief cache stats                          # count, size, age, per-provider
commitbrief cache prune --keep-last 500 --older-than 7d
commitbrief cache inspect <key> --show-content   # one entry's metadata + body

prune

keeps an entry only if it's inside both windows — among the newest 500 and younger than seven days.

When CommitBrief does call a provider, a cost preflight runs first: it estimates input tokens at roughly four characters each, guesses output conservatively (floored at 200 tokens, capped at 1500 — a structured review rarely runs longer), multiplies by the model's price table, and prompts only if the estimate clears your threshold (cost.warn_threshold_usd

, default $0.50

). A cache hit skips that whole machine. On a paid provider, the second review of an unchanged diff costs literally nothing; on a local Ollama model (post 5) it was already free, but the cache still saves you the inference seconds.

A cache hit replays the first answer verbatim — including its mistakes. The cache makes a re-run free; it does not make it better. If the model missed something the first time, the cached entry will keep missing it until the inputs change or you force a fresh call with --no-cache

. And the store is deliberately repo-local: .commitbrief/cache/

on your machine, never a shared team server, because there isn't one — the same local-first stance that runs through everything else. The cache saves you tokens and time; it doesn't pretend to be a source of truth.

Repo: github.com/CommitBrief/commitbrief.

Part 6 of Building CommitBrief. Next: exposing the whole review pipeline as a Model Context Protocol tool — JSON-RPC over stdio, in standard-library Go, with zero new dependencies.

source & further reading

dev.to — original article KV Cache Is Eating Your VRAM — Here's How to Estimate It Before You Run Out 7 AI Prompts That Save Me 10+ Hours Every Week as a Developer (Copy-Paste Ready) AIKit Launch Metrics Dashboard: Product Demo Signals That Turn Blog Traffic Into Funnel Decisions

Caching LLM responses is just content addressing

Run your AI side-project on zahid.host