# I Taught My AI to Read Runbooks to Stop Guessing Incident Fixes

> Source: <https://pub.towardsai.net/i-taught-my-ai-to-read-runbooks-to-stop-guessing-incident-fixes-325d1f4c28fb?source=rss----98111c9905da---4>
> Published: 2026-06-30 03:38:10+00:00

Your team already wrote the fix. It’s in a runbook somebody added two years ago, in a wiki you haven’t opened since. The outage paging you right now is the exact one it describes. You won’t find it. You’ll grep Slack, ping a teammate, and rediscover the fix from scratch.

The problem isn’t that the knowledge is missing. It’s that the keyword search can’t find it. The runbook says “connection pool exhausted.” The alert says “too many clients already.” Same incident, zero shared words.

This post walks through the AI that closes that gap: how embeddings turn text into meaning, how retrieval-augmented generation grounds an answer in your own docs, and how a read-only tool ties it together during a real investigation.

Keyword search matches characters. “Disk full” finds documents containing “disk” and “full.” It does not find “volume at 95% capacity,” even though that’s the same problem stated by a different engineer on a different day. The production text is full of synonyms, abbreviations, and error strings that nobody planned for. Lexical matching breaks on all of it.

You need to search for that which matches *meaning*, not spelling. That’s what embeddings give you.

An embedding model takes a chunk of text and returns a list of numbers — a vector, usually a few hundred to a couple thousand dimensions. The model is trained so that text with similar meaning lands close together in that space, and unrelated text lands far apart.

“Postgres connection pool exhausted” and “too many clients already, max connections reached” produce vectors that point in nearly the same direction. “Cache warm-up after deploy” points somewhere else entirely. The words barely overlap; the vectors do.

So similarity becomes geometry. To find the closest match, you measure the angle between two vectors with cosine similarity:

```
cosine(a, b) = (a ⋅ b) / (‖a‖ × ‖b‖)
```

The result sits between -1 and 1. Near 1 means “almost the same direction” — same meaning as near 0 means unrelated. You embed every document once, embed the query at search time, score each document, and keep the top few. That’s a top-K nearest neighbor search, and it’s the whole engine.

A language model trained last year has never seen your runbooks. Asking “how do I fix the api service?” gets you a plausible-sounding paragraph that may be wrong — it’s guessing from general knowledge.

Retrieval-augmented generation flips that. Instead of trusting the model’s memory:

The model stops inventing and starts citing. Its answer is only as good as the docs you retrieved — which is exactly what you want for incident response, where a wrong fix is worse than no fix. The hard, valuable part is step 1: retrieving the *right* documents.

Theory is clean. Production is not. Three things bite you when you wire RAG into an incident workflow.

**You embed once, search forever.** Embedding is the expensive call. So you embed the corpus when a doc is added or changed, cache the vector, and never recompute it for unchanged text. A content hash per document tells you what actually changed. Re-indexing 500 runbooks where one edited should cost one embedding call, not 500.

**The query is messy.** A real query isn’t a tidy sentence — it’s derived from the incident: log lines, stack traces, service names. That text can carry secrets, tokens, and customer PII. Embedding sends it to a model endpoint, which is an external boundary. So you scrub the query *before* it leaves the box, and scrub the excerpts on the way back. Redaction first, embedding second — not the other way around.

**Scope the search.** A 5,000-document corpus has noise. If you already know the incident is on the api service, filtering to that service before scoring cuts false matches and sharpens the top-K. Meaning-based search plus a cheap exact filter beats either alone.

The shape that survives contact with production:

```
ingest:  read docs → hash → embed new/changed → cache vectorssearch:  scrub query → embed → cosine top-K → scrub excerpts → return
```

Boring, cheap, and it finds the disk doc when the alert says “volume at 95%.”

A runbook is a plain Markdown file. The only convention is optional YAML front-matter at the top to tag it — everything below is the fix:

```
---title: Postgres connection pool exhaustedservice: apitags: [database, latency]---When the api service reports "too many clients already", the pool is saturated.1. Check active connections: SELECT count(*) FROM pg_stat_activity;2. Kill idle transactions older than 5m.3. If sustained, raise max_connections and recycle the pool.
```

title and service aren't decorations. The title joins the body in what gets embedded, and the service becomes the cheap exact filter from above. Tags are metadata for later.

Ingestion peels off the front matter and turns each file into a record. If there’s no *title*, it falls back to the first H1, then to the filename — so a bare markdown file still indexes. The text that gets embedded is *title + body*, hashed so an unchanged file is skipped next time:

```
parse(file):    front_matter, body = split off the "--- ... ---" block at the top    title = front_matter.title            or first "# heading" in body            or the filename            # always something to show    return record {        id       = file path           # stable handle for updates        title    = title        services = front_matter.services   # the cheap exact filter        body     = trimmed body    }index(record):    text = title + body                # title carries weight, so embed it too    hash = checksum(text)    if hash == cached.hash: reuse cached vector   # unchanged → skip embedding    else:                   vector = embed(text); cache(vector, hash)
```

The search tool itself is small. It holds three things — an embedder, an index, and a redactor — and runs them in order: scrub the query, turn it into a vector, fetch the closest runbooks, and hand them back. That’s the whole tool:

```
tool FindRunbook:    embedder    # turns text into a vector    index       # cosine top-K over the embedded corpus    redactor    # strips secrets/PII before anything egresses    invoke(query, service, limit):        clean  = redactor.scrub(query)         # 1. never send raw incident text out        vector = embedder.embed(clean)         # 2. one network call → one vector        hits   = index.search(vector, service, limit)  # 3. rank corpus by cosine        return { found: hits is not empty, matches: hits }   # 4. hand back, no writes
```

Line by line, this is the entire idea:

Four lines. Everything else in this post is the details behind them.

So far, this is a theory. The good news: you don’t have to build it. [Versus Incident](https://github.com/VersusControl/versus-incident), an open-source incident router, ships exactly this as a tool called *find_runbook*. Let's turn it on against runbooks you already have.

*find_runbook* is one tool in the SRE agent's investigation toolkit. When an incident fires, the agent reaches for tools — read logs, check recent deploys, and, if it's enabled, search your runbooks. That last one is *find_runbook*: it takes a description of the problem, runs the embedding search from Part 1, and hands back the closest-matching runbook excerpts. It never changes anything — it reads.

Versus runs as a Docker container. Everything lives in two host folders you mount in: config/ for the YAML and data/ for state. Your runbooks are just markdown files under data/runbooks/. On your machine:

```
my-versus/  config/    config.yaml      # the server config (channels, AI, redaction)    tools.yaml       # per-tool config — where you enable find_runbook  data/    runbooks/        # ← drop your markdown runbooks here      postgres-pool-exhausted.md      redis-evictions.md      disk-pressure.md
```

config/ mounts to /app/config in the container, data/ to /app/data — so the container reads runbooks at /app/data/runbooks. No schema, no reformatting: a bare markdown file indexes off its first # heading; add service: api front-matter only where you want the agent to scope by service. Add a file, and you've added knowledge.

find_runbook runs inside the agent's AI analyze mode, so AI has to be on (env), and the tool has to be named in config/tools.yaml. No model, no tool:

```
# config/tools.yamltools:  find_runbook:    embedding_model: text-embedding-3-small   # empty = tool stays off
```

There’s no second credential — embeddings reuse the same AI key that the agent analyzes with. Want embeddings to stay in-network? Set agent.ai.provider: ollama in config.yaml and the text never leaves your infra. No code change.

Drop this docker-compose.yml next to your config/ and data/ folders. Redis is the one dependency — the agent uses it to remember where it left off:

```
services:  redis:    image: redis:7    ports: ["6379:6379"]  versus:    image: ghcr.io/versuscontrol/versus-incident    ports: ["3000:3000"]    environment:      GATEWAY_SECRET: change-me      AGENT_ENABLE: "true"      AGENT_AI_ENABLE: "true"      AGENT_AI_API_KEY: ${OPENAI_API_KEY}   # reused for embeddings      REDIS_HOST: redis      REDIS_PORT: "6379"    volumes:      - ./config:/app/config    # config.yaml + tools.yaml      - ./data:/app/data        # runbooks live in ./data/runbooks    depends_on: [redis]
```

docker compose up and it ingests at boot — scans data/runbooks, embeds new or edited files, and persists the vectors. The log confirms it:

```
agent: find_runbook: ingested 6 runbook(s) from ./data/runbooksagent: find_runbook enabled model=text-embedding-3-small runbooks=6
```

Restarts are incremental: unchanged files reuse cached vectors, so a no-edit restart makes zero embedding calls. The index is in-memory; no vector database to run. You can also upload runbooks live from the admin UI at localhost:3000 — uploads rebuild the index without a restart.

Now an alert lands: FATAL: remaining connection slots are reserved... error rate climbing. The agent decides it needs your remediation steps and calls the tool — nobody picks it:

```
{  "query": "postgres connection slots reserved, pool exhausted on api",  "service": "api",  "limit": 3}
```

Scrubbed, embedded, scored by cosine against every runbook, filtered to api, and the match returns:

```
{  "found": true,  "data": {    "count": 1,    "matches": [{      "id": "postgres-pool-exhausted.md",      "title": "Postgres connection pool exhausted",      "service": "api",      "score": 0.89,      "excerpt": "1. Check active connections: SELECT count(*)...; 2. Terminate idle-in-transaction; 3. Roll back the api deploy."    }]  }}
```

The query said, “exhausted.” The alert said “slots reserved.” Zero shared keywords, 0.89 similarity. The agent folds those exact steps into its conclusion, and the call shows up in the analysis result’s **Tool calls** section — so you can audit which runbook grounded the finding. A human still runs it.

The payoff scales with what you already have: every postmortem you write is one more file the agent surfaces next time. The wiki stops being where knowledge goes to die.

Runbook search isn’t magic. Embeddings turn meaning into vectors, cosine similarity ranks them, and RAG feeds the winners to the model so it cites instead of guessing.

[I Taught My AI to Read Runbooks to Stop Guessing Incident Fixes](https://pub.towardsai.net/i-taught-my-ai-to-read-runbooks-to-stop-guessing-incident-fixes-325d1f4c28fb) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.