# Caching LLM responses is just content addressing

> Source: <https://dev.to/muhammetsafak/caching-llm-responses-is-just-content-addressing-2102>
> Published: 2026-06-29 00:00:00+00:00

**An LLM review costs money and a few seconds of latency. Reviewing the same diff twice should cost neither.** CommitBrief caches every review, but the interesting part isn't *that* it caches — it's that the cache is content-addressed, so a hit is provably the same review, and there is no such thing as a stale one. Editing a single line of your rules file invalidates exactly the entries it should, and not one more, with zero invalidation logic anywhere in the code.

**TL;DR**

Everything good about this cache falls out of one function. Here it is, complete:

```
func Compute(args ComputeArgs) string {
    h := sha256.New()
    h.Write([]byte(args.Diff))
    h.Write([]byte("::"))
    h.Write([]byte(args.SystemPrompt))
    h.Write([]byte("::"))
    h.Write([]byte(args.Provider))
    h.Write([]byte(":"))
    h.Write([]byte(args.Model))
    h.Write([]byte(":"))
    h.Write([]byte(args.Lang))
    h.Write([]byte(":"))
    h.Write([]byte(strconv.Itoa(SchemaVersion)))
    if args.WithContext {
        h.Write([]byte(":ctx"))
    }
    if args.Mode != "" {
        h.Write([]byte(":mode:" + args.Mode))
    }
    return hex.EncodeToString(h.Sum(nil))
}
```

Each input is in the key because each one can change the output:

`COMMITBRIEF.md`

rules, the severity rubric, the response-format contract, and any architecture constraints. This is the load-bearing one for invalidation, below.`1`

. Bump it and The two trailing markers are a lesson in not breaking your own cache. `:ctx`

and `:mode:`

are appended *only when set*. A plain review writes neither, so its key is byte-identical to what the same review produced three versions ago — adding the `--with-context`

feature and the `commit`

mode didn't invalidate anybody's existing cache. New behavior gets new key-space; unchanged behavior keeps its old keys. That discipline is why upgrades don't silently nuke everyone's cache on the first run.

Lookup is a file read, an unmarshal, and two guards:

```
func (c *Cache) Get(key string) (Entry, bool) {
    path := c.entryPath(key)
    data, err := os.ReadFile(path)
    if err != nil {
        return Entry{}, false
    }
    var e Entry
    if err := json.Unmarshal(data, &e); err != nil {
        _ = os.Remove(path) // corrupt entry: drop it, next write replaces
        return Entry{}, false
    }
    if e.Version != SchemaVersion {
        return Entry{}, false
    }
    if e.ExpiredAt(c.now()) {
        return Entry{}, false
    }
    return e, true
}
```

No network, no tokens. And because the lookup happens *before* the cost preflight in the pipeline, a hit skips the cost estimate altogether — there's nothing to estimate when you're not calling anyone. On an unchanged diff, a re-run is effectively instant and free.

This is the payoff. There is no `invalidateCacheAfterEditingRules()`

anywhere in the codebase, because it would be dead code. The system prompt is *in* the key, and your rules are *in* the system prompt. So the moment you change one line of `COMMITBRIEF.md`

, the assembled prompt's bytes change, its SHA-256 changes, and the old entry's key is one nobody will ever compute again. The stale review isn't deleted — it's unreachable, and the next review writes a fresh entry under the new key.

Content addressing means a cache hit is, by construction, a review produced from byte-identical inputs. There's no heuristic deciding whether a cached answer is "still valid," because validity isn't a question you can ask of a content-addressed store — the inputs either hash to the same key or they don't.

A cache entry is one JSON file per response:

```
type Entry struct {
    Version   int       `json:"version"`
    CreatedAt time.Time `json:"created_at"`
    TTL       int64     `json:"ttl"`
    Key       KeyMeta   `json:"key"`
    Result    Result    `json:"result"`
}
```

`Result`

carries a `Format`

marker — `json`

, `markdown-fallback`

, or `plain-text`

— so a degraded review ([post 3](https://dev.to/muhammetsafak/getting-structured-json-out-of-five-incompatible-llm-apis-and-degrading-when-they-ignore-you-27jg)) or a CLI provider's pre-formatted output replays down exactly the right renderer path, with no warning re-emitted on a cache hit. Writes are atomic: serialize to a temp file, then rename into place.

```
tmp := path + ".tmp"
if err := os.WriteFile(tmp, data, 0o600); err != nil {
    return err
}
return os.Rename(tmp, path)
```

`os.Rename`

is atomic on a POSIX filesystem, so a crash mid-write leaves a `.tmp`

file, never a half-written entry that would later unmarshal into garbage. Mode `0600`

keeps the cached review readable only by you. And the first successful write appends `.commitbrief/`

to the repo's `.gitignore`

, so your cache never lands in a commit.

Left alone, the cache grows. Two mechanisms keep it in check. If `cache.max_size_mb`

is set, an eviction sweep runs *after* each write — oldest-first by `CreatedAt`

(file mtime as fallback) — until the total fits, and the just-written entry is always protected, so a single review larger than the budget still survives the write that created it. Entries also carry a TTL, defaulting to seven days. And you can prune by hand:

```
commitbrief cache stats                          # count, size, age, per-provider
commitbrief cache prune --keep-last 500 --older-than 7d
commitbrief cache inspect <key> --show-content   # one entry's metadata + body
```

`prune`

keeps an entry only if it's inside *both* windows — among the newest 500 *and* younger than seven days.

When CommitBrief does call a provider, a cost preflight runs first: it estimates input tokens at roughly four characters each, guesses output conservatively (floored at 200 tokens, capped at 1500 — a structured review rarely runs longer), multiplies by the model's price table, and prompts only if the estimate clears your threshold (`cost.warn_threshold_usd`

, default `$0.50`

). A cache hit skips that whole machine. On a paid provider, the second review of an unchanged diff costs literally nothing; on a local Ollama model ([post 5](https://dev.to/muhammetsafak/air-gapped-code-review-with-ollama-when-the-diff-never-leaves-the-machine-4kb8)) it was already free, but the cache still saves you the inference seconds.

A cache hit replays the first answer verbatim — including its mistakes. The cache makes a re-run free; it does not make it *better*. If the model missed something the first time, the cached entry will keep missing it until the inputs change or you force a fresh call with `--no-cache`

. And the store is deliberately repo-local: `.commitbrief/cache/`

on your machine, never a shared team server, because there isn't one — the same local-first stance that runs through everything else. The cache saves you tokens and time; it doesn't pretend to be a source of truth.

Repo: **github.com/CommitBrief/commitbrief**.

*Part 6 of **Building CommitBrief**. Next: exposing the whole review pipeline as a Model Context Protocol tool — JSON-RPC over stdio, in standard-library Go, with zero new dependencies.*