{"slug": "hw-to-make-sure-ai-agents-neer-forget-the-work", "title": "Hw to make sure AI Agents neer forget the Work?", "summary": "A developer released two open-source templates for Claude Code and OpenAI Codex that prevent AI coding agents from forgetting project context by storing rules and decisions in files and automatically injecting them via hooks. The templates address common failures like context loss between sessions, silent errors, and repeated mistakes by forcing recall at session start, on every message, and before edits. This ensures agents consistently follow project-specific rules without relying on model memory.", "body_md": "A coding agent forgets. Halfway through a build the context compacts, and suddenly it's re-asking things you settled an hour ago or quietly ignoring a rule you set on day one. These are two drop-in templates — one for Claude Code, one for Codex — that fix that by keeping the project's memory in files and feeding it back to the agent automatically.\n\n```\nai-coding-project-templates/\n├─ claude-project-template/   for Claude Code   (rulebook = CLAUDE.md,  hook via .claude/settings.json)\n└─ codex-project-template/    for OpenAI Codex  (rulebook = AGENTS.md,  hook via .codex/hooks.json)\n```\n\nBoth folders are the same system. The only differences are the rulebook filename and how each agent registers hooks.\n\n[What problem do these solve?](#what-problem-do-these-solve)[Pick your agent](#pick-your-agent)[Install (per project)](#install-per-project)[How recall is forced, not hoped-for](#how-recall-is-forced-not-hoped-for)[Looking things up by meaning (recall)](#looking-things-up-by-meaning-recall)[Staying pointed at the goal](#staying-pointed-at-the-goal)[The decision tree (see it, and roll back to it)](#the-decision-tree)[What's inside each template](#whats-inside-each-template)[Optional companion: graphify](#optional-companion-graphify)[License](#license)\n\nFour things that go wrong on any long-running agent session, and what the template does about each:\n\n```\nWithout                                 With\n───────                                 ────\nForgets between sessions; you           Every message, decision and change lands in DOCS/,\nre-explain the project constantly.      so a fresh session reloads the full context.\n\n\"Done\" means \"the code exists,\"         \"Done\" means a passing test or a real run (E0–E5),\neven if it never ran.                   not just code that imports.\n\nSame error retried forever.             Three strikes on the same failure → stop and diagnose.\n\nSilent fallbacks; lost reasoning.       Fallbacks must be stated. Decisions and failures are\n                                        logged with the test that stops them returning.\n```\n\n| You use | Open this folder | Rulebook |\n|---|---|---|\n| Claude Code (CLI / VS Code / desktop) | `claude-project-template/` |\n`CLAUDE.md` |\n| OpenAI Codex (CLI / VS Code / desktop) | `codex-project-template/` |\n`AGENTS.md` |\n\nDo this once per project, when you copy the template in:\n\n- Copy your folder's contents into the project root.\n- Fill in the\n`<PROJECT_NAME>`\n\n/`<PROJECT_ROOT>`\n\n/`<OWNER>`\n\n/`<DATE>`\n\nplaceholders. - Open the project in your agent and trust its hooks.\n- Run both\n`hooks/verify_*.ps1`\n\nchecks — they should all pass. - Paste the first-session block from\n`DOCS/STARTUP_MESSAGE.md`\n\ninto the first chat. (The prompts are also collected in[START_HERE.md](/budhasantosh010/ai-coding-project-templates/blob/main/START_HERE.md).)\n\nSaving context to a file is the easy half. The hard half is getting the agent to actually look at it once the conversation has moved on. Telling it \"check DECISIONS.md when unsure\" is a sign on the wall — it can walk right past it.\n\nSo the templates don't rely on that. Three hooks read your files and push the relevant bits back into the agent's view at the moments it tends to forget:\n\n```\nwhen                            hook                      what it puts back\n────                            ────                      ─────────────────\nsession starts or compacts      inject_context            CURRENT_STATE + the DEC/REQ/FAIL list\nyou send a message              inject_on_prompt          the active rules + \"read the transcript\"\nright before an edit            inject_decisions_preedit  the active DEC/REQ rules, at the edit\n```\n\nThe point: \"we use pnpm, not npm\" stops depending on the model remembering it. The rule is on screen at session start, on every message, and again right before the agent writes the install command. A hook can't be skipped, so the information is guaranteed to be there. And they all fail safe — if a hook errors it prints nothing and never blocks your session.\n\nRe-injecting your rules handles the recent stuff. But what about \"what did we decide about\npricing three months ago?\" — buried in a long transcript, maybe phrased with different words.\nThat's what the recall hook (`recall.ps1`\n\n, on every message) does, and it's built to cost almost\nnothing:\n\n```\n1. Does the message even look back? (\"remember…\", \"earlier\", \"that bug\", \"it/this\")\n   No  → do nothing. 0 tokens. (most messages)\n   Yes → continue.\n2. Resolve vague words: \"find it\" → the most-mentioned recent thing.\n3. Search, cheapest first:\n     tier 1  keyword over decisions + transcript        (free, instant)\n     tier 2  semantic by meaning — ONLY if tier 1 weak  (local model; \"login\" finds \"auth\")\n4. Verify: if the hit names a file, grep the CURRENT file → CONFIRMED or STALE.\n5. Inject a tiny cited pointer (~40 tokens): \"DEC-004 (msg 7): pnpm only [CONFIRMED]\"\n     …or, if nothing matched: \"not found\" — so the agent says so instead of inventing an answer.\n```\n\nTwo things make this trustworthy: it **cites** where the answer came from (decision id, message\nnumber, file), and it **admits when it doesn't know** rather than hallucinating. The expensive\nsemantic step only runs when the cheap keyword step comes up short — so on a normal message the\nrecall layer is silent and free.\n\n**Semantic search is optional.** It switches on only if you install one library\n(`uv pip install sentence-transformers`\n\n, or `pip install --user sentence-transformers`\n\n). Without\nit, recall works in keyword mode and everything still runs. The embedding model runs locally on\nyour CPU — zero API cost either way.\n\nA separate hook (`goal_convergence.ps1`\n\n, after each turn) keeps score against your ROOT goal. It\nreads the active decisions, open blockers, and whether recent work still overlaps the goal, then\nwrites a one-line status — `ON-TRACK`\n\n, `DRIFTING`\n\n, or `BLOCKED`\n\n— to `DOCS/GOAL_STATUS.md`\n\n, and\nsurfaces it only when it changes. It's a cheap code proxy (zero tokens), so it's an early-warning\nflag, not a verdict; for the real \"are we actually there?\" judgment you ask the agent directly at\na milestone.\n\nA long project is really a tree of decisions: one goal, a fork with a few options, you pick one, that becomes the new trunk, it forks again. Markdown is a bad shape for reading that — a flattened list loses which branch came from which fork. So the template also keeps the decisions as a tree you can actually look at.\n\nEvery real decision the agent makes is appended to `DOCS/_raw/decisions.jsonl`\n\n(with the user\nmessage number it came from, the options that were on the table, which was chosen, and the git\ncommit at that moment). After each turn a hook redraws three views — all pure scripting, zero\nmodel tokens:\n\n```\nDOCS/decision_tree.txt        the big picture as text: a left \"main goal\" spine, every\n                              decision branching off it, options fanning out, the picked one\n                              marked, down to a goal-check box. Code-drawn, so the layout is\n                              exact and never shifts.\nDOCS/decision_tree/msg_*.svg  one small clean picture PER message (renders cleanly because\n                              it's small). Append-only — the folder IS your history.\nDOCS/decision_tree_FULL.txt   every user message in tree shape, each tagged with the decision\n                              it produced or \"(no decision)\". The complete timeline.\nDOCS/decision_tree_history/   timestamped snapshots of the text views before each redraw.\n```\n\nThe text big-picture looks like this:\n\n```\n[ROOT] MAIN GOAL: never lose work across sessions\n  |\n  +-- {MSG 3}  session recovery     ( manual-copy )  <PICKED: bridge-tool>  ( ignore )\n  +-- {MSG 7}  template strategy     ( memory-only )  <PICKED: governance>   ( hybrid )\n  +-- {MSG 12} fix recall            ( instructions ) <PICKED: injection-hooks>\n  v\n[ROOT] EXPECTED FINAL GOAL  →  |GOAL CHECK| how close are we?\n```\n\n(The tree shows only the messages where a real decision was made — the forks. Every message,\ndecision or not, is still in `DOCS/_raw/user_messages.txt`\n\nand in the FULL timeline.)\n\nThe picture is for you. But it's also how you **direct the agent without ambiguity**. Instead\nof \"go back to where we decided that thing,\" you point at a node:\n\n```\nYou:   \"DEC-003 was the wrong call — roll back to it.\"\nAgent: hooks/rollback_to_decision.ps1 -Id DEC-003 -Apply\n       → git-reverts to that decision's stored commit, redraws the tree, marks later\n         decisions superseded. Deterministic — the commit hash is the single source of truth.\n```\n\nYou can point by decision id (`-Id DEC-003`\n\n) or by message number (`-Msg 48`\n\n) — both resolve to\none exact commit, so there's nothing for the agent to guess. (Rollback needs git in the project;\nthe agent always previews before applying.)\n\n```\n<root>/\n├─ CLAUDE.md / AGENTS.md   the rulebook the agent auto-loads\n├─ .claude/ or .codex/     wires up the logging + injection hooks\n├─ hooks/\n│   ├─ log_user_message.ps1          saves every message word-for-word (+ numbers them)\n│   ├─ inject_context.ps1            re-injects the spine on start / after compaction\n│   ├─ inject_on_prompt.ps1          injects active rules with every message\n│   ├─ inject_decisions_preedit.ps1  injects active rules right before an edit\n│   ├─ recall.ps1                    looks up the past on a look-back message (keyword + semantic)\n│   ├─ embed.py                      optional local semantic embedder ($0, by-meaning search)\n│   ├─ index_semantic.ps1            incrementally indexes new content for semantic recall\n│   ├─ record_decision.ps1           logs a decision (msg#, options, chosen, git commit)\n│   ├─ render_decision_tree.ps1      draws the text + per-message SVG + FULL timeline\n│   ├─ rollback_to_decision.ps1      \"roll back to DEC-X\" → git-revert + re-route tree\n│   ├─ goal_convergence.ps1          scores progress vs the ROOT goal\n│   ├─ verify_project_setup.ps1      checks every required file exists\n│   └─ verify_governance.ps1         checks the rules haven't been gutted\n└─ DOCS/\n   ├─ INDEX.md             map of all docs + which one wins in a conflict\n   ├─ CURRENT_STATE.md     what's verified true right now (+ the E0–E5 legend)\n   ├─ REQUIREMENTS.md      testable user needs (REQ-XXX)\n   ├─ DECISIONS.md         architecture choices and why (DEC-XXX)\n   ├─ FAILURE_REGISTRY.md  recurring bugs + the regression test (FAIL-XXX)\n   ├─ ANTI_DRIFT_PROTOCOL.md  short loop, three-strike, no silent fallback\n   ├─ CHANGE_POLICY.md     raw request → REQ → evidence → one commit → record\n   ├─ CHANGE_RECORD_TEMPLATE.md\n   ├─ GIT_RUNBOOK.md       safe commit / branch / rollback\n   ├─ HANDOVER_RUNBOOK.md  zero-context operator guide\n   ├─ STARTUP_MESSAGE.md   prompts to paste at session start\n   ├─ BOOTSTRAP_PROMPT.md  prompt to install this system into a fresh project\n   ├─ PROJECT_LOG.md       append-only history\n   ├─ BUILD_TRACKER.md     status board\n   ├─ STATECHART.md        optional visual\n   ├─ plans/ changes/ runs/\n   └─ _raw/user_messages.txt   exact word-for-word transcript\n```\n\nThe templates remember what you said and decided. They don't map where your code lives. On a\nbig repo that second kind of memory matters too, and [graphify](https://github.com/safishamsi/graphify)\nalready does it well — it builds a queryable graph of your code so the agent looks things up\ninstead of grepping through 200 files.\n\n```\nthese templates                         graphify\n───────────────                         ────────\nwhat did we decide / say / try?         where is the auth code, what calls it?\na diary                                 a map\n```\n\nDifferent jobs, no overlap. If you want both, here's how they fit — but a few things trip people up, so they're worth spelling out.\n\nNo. They're separate. Nothing in these templates installs or calls graphify, and copying a template in does not pull it in.\n\n```\ntemplate on its own       graphify isn't there, nothing happens\n+ graphify install        the agent starts using the graph during a session\n+ graphify hook install   the map rebuilds itself on every git commit\n```\n\ngraphify only starts doing anything after you run its own commands in a project. It never fires on its own from this repo.\n\nPeople run these together as one step and then wonder why the map is stale. They're four separate things:\n\n```\n1. install the tool       once per laptop, forever     uv tool install graphifyy\n2. wire it into a project  once per project             graphify install   (or --platform codex)\n3. build the first map     once per project             graphify .\n4. auto-refresh the map    once per project             graphify hook install\n```\n\nStep 4 is the one most people skip, and it's why \"install once and it runs itself\" is only half true. The tool installs once. But the map doesn't rebuild on its own until you add the post-commit hook in step 4 — until then, every code change leaves it a little more out of date.\n\nSo per project it's three quick commands:\n\n```\ngraphify install        # agent uses the graph\ngraphify .              # build the first map\ngraphify hook install   # rebuild on every commit, then forget about it\n```\n\nCommit the `graphify-out/`\n\nfolder so teammates start with the map already built, and query it\nwhenever you want:\n\n```\ngraphify query \"what connects auth to the database?\"\n```\n\nOn a real task the two systems hand off cleanly — the template supplies the rules, graphify supplies the map:\n\n```\n\"add rate-limiting to the login route\"\n   ├─ template:  DEC-004 pnpm only · REQ-002 needs an integration test\n   └─ graphify:  login route → AuthService → RateLimiter → Redis\n```\n\nBuilding the map is two jobs, and only one of them costs anything.\n\nReading code structure — functions, files, what calls what — runs locally with tree-sitter.\nThat's free; nothing leaves your machine. Understanding *meaning* (tying docs and PDFs to code,\nnaming concepts, summarizing) is sent to an LLM, and that's the part that costs tokens, because\nan actual model has to read it.\n\nThat split is also why the refresh runs on commit instead of constantly in the background —\neach rebuild spends a little on that LLM, so it waits for your commit rather than burning money\nwhile you sleep. You decide when it costs anything. And if you'd rather it cost nothing, point\nit at a local model (`--backend ollama`\n\n) and even the meaning step stays on your machine.\n\n(graphify is a separate project, not affiliated with this repo. The PyPI package is `graphifyy`\n\nwith a double y. Add `graphify-out/cost.json`\n\nto your `.gitignore`\n\n.)\n\nFor \"just map my code so the agent finds things fast,\" the free structural map is enough on its\nown. It already answers the questions you actually ask: where is `UserService`\n\ndefined, what\ncalls `login()`\n\n, what does `auth.ts`\n\nimport, what breaks if I change this.\n\nThe reason the paid meaning layer is mostly redundant is simple — your coding agent is already a model. It reads the structural map and works out the meaning itself, on the fly. Paying a second LLM up front to pre-chew that is doing a job your agent does for free as it goes.\n\nWhat you give up by skipping it: understanding non-code files like PDFs and design docs, inferred conceptual links that aren't written literally in the code, nicely-named clusters, and the \"why\" pulled out of comments. All nice to have, none of it needed to navigate code. It earns its keep when you've got a lot of docs to tie to the code, or a huge repo where the connections aren't obvious, or you're onboarding people who need the reasoning. Otherwise: structural map plus a capable agent is plenty.\n\nMIT. Use it, fork it, ship it.", "url": "https://wpnews.pro/news/hw-to-make-sure-ai-agents-neer-forget-the-work", "canonical_source": "https://github.com/budhasantosh010/ai-coding-project-templates", "published_at": "2026-06-24 14:26:59+00:00", "updated_at": "2026-06-24 14:39:59.868222+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-tools", "large-language-models", "generative-ai"], "entities": ["Claude Code", "OpenAI Codex", "CLAUDE.md", "AGENTS.md", "DOCS/STARTUP_MESSAGE.md", "START_HERE.md", "hooks/verify_*.ps1", "ai-coding-project-templates"], "alternates": {"html": "https://wpnews.pro/news/hw-to-make-sure-ai-agents-neer-forget-the-work", "markdown": "https://wpnews.pro/news/hw-to-make-sure-ai-agents-neer-forget-the-work.md", "text": "https://wpnews.pro/news/hw-to-make-sure-ai-agents-neer-forget-the-work.txt", "jsonld": "https://wpnews.pro/news/hw-to-make-sure-ai-agents-neer-forget-the-work.jsonld"}}