A coding agent forgets. Halfway through a build the context compacts, and suddenly it's re-asking things you settled an hour ago or quietly ignoring a rule you set on day one. These are two drop-in templates β one for Claude Code, one for Codex β that fix that by keeping the project's memory in files and feeding it back to the agent automatically.
ai-coding-project-templates/
ββ claude-project-template/ for Claude Code (rulebook = CLAUDE.md, hook via .claude/settings.json)
ββ codex-project-template/ for OpenAI Codex (rulebook = AGENTS.md, hook via .codex/hooks.json)
Both folders are the same system. The only differences are the rulebook filename and how each agent registers hooks.
What problem do these solve?Pick your agentInstall (per project)How recall is forced, not hoped-forLooking things up by meaning (recall)Staying pointed at the goalThe decision tree (see it, and roll back to it)What's inside each templateOptional companion: graphifyLicense
Four things that go wrong on any long-running agent session, and what the template does about each:
Without With
βββββββ ββββ
Forgets between sessions; you Every message, decision and change lands in DOCS/,
re-explain the project constantly. so a fresh session reloads the full context.
"Done" means "the code exists," "Done" means a passing test or a real run (E0βE5),
even if it never ran. not just code that imports.
Same error retried forever. Three strikes on the same failure β stop and diagnose.
Silent fallbacks; lost reasoning. Fallbacks must be stated. Decisions and failures are
logged with the test that stops them returning.
| You use | Open this folder | Rulebook |
|---|---|---|
| Claude Code (CLI / VS Code / desktop) | claude-project-template/ |
|
CLAUDE.md |
||
| OpenAI Codex (CLI / VS Code / desktop) | codex-project-template/ |
|
AGENTS.md |
Do this once per project, when you copy the template in:
- Copy your folder's contents into the project root.
- Fill in the
<PROJECT_NAME>
/<PROJECT_ROOT>
/<OWNER>
/<DATE>
placeholders. - Open the project in your agent and trust its hooks.
- Run both
hooks/verify_*.ps1
checks β they should all pass. - Paste the first-session block from
DOCS/STARTUP_MESSAGE.md
into the first chat. (The prompts are also collected inSTART_HERE.md.)
Saving context to a file is the easy half. The hard half is getting the agent to actually look at it once the conversation has moved on. Telling it "check DECISIONS.md when unsure" is a sign on the wall β it can walk right past it.
So the templates don't rely on that. Three hooks read your files and push the relevant bits back into the agent's view at the moments it tends to forget:
when hook what it puts back
ββββ ββββ βββββββββββββββββ
session starts or compacts inject_context CURRENT_STATE + the DEC/REQ/FAIL list
you send a message inject_on_prompt the active rules + "read the transcript"
right before an edit inject_decisions_preedit the active DEC/REQ rules, at the edit
The point: "we use pnpm, not npm" stops depending on the model remembering it. The rule is on screen at session start, on every message, and again right before the agent writes the install command. A hook can't be skipped, so the information is guaranteed to be there. And they all fail safe β if a hook errors it prints nothing and never blocks your session.
Re-injecting your rules handles the recent stuff. But what about "what did we decide about
pricing three months ago?" β buried in a long transcript, maybe phrased with different words.
That's what the recall hook (recall.ps1
, on every message) does, and it's built to cost almost nothing:
1. Does the message even look back? ("rememberβ¦", "earlier", "that bug", "it/this")
No β do nothing. 0 tokens. (most messages)
Yes β continue.
2. Resolve vague words: "find it" β the most-mentioned recent thing.
3. Search, cheapest first:
tier 1 keyword over decisions + transcript (free, instant)
tier 2 semantic by meaning β ONLY if tier 1 weak (local model; "login" finds "auth")
4. Verify: if the hit names a file, grep the CURRENT file β CONFIRMED or STALE.
5. Inject a tiny cited pointer (~40 tokens): "DEC-004 (msg 7): pnpm only [CONFIRMED]"
β¦or, if nothing matched: "not found" β so the agent says so instead of inventing an answer.
Two things make this trustworthy: it cites where the answer came from (decision id, message number, file), and it admits when it doesn't know rather than hallucinating. The expensive semantic step only runs when the cheap keyword step comes up short β so on a normal message the recall layer is silent and free.
Semantic search is optional. It switches on only if you install one library
(uv pip install sentence-transformers
, or pip install --user sentence-transformers
). Without it, recall works in keyword mode and everything still runs. The embedding model runs locally on your CPU β zero API cost either way.
A separate hook (goal_convergence.ps1
, after each turn) keeps score against your ROOT goal. It
reads the active decisions, open blockers, and whether recent work still overlaps the goal, then
writes a one-line status β ON-TRACK
, DRIFTING
, or BLOCKED
β to DOCS/GOAL_STATUS.md
, and surfaces it only when it changes. It's a cheap code proxy (zero tokens), so it's an early-warning flag, not a verdict; for the real "are we actually there?" judgment you ask the agent directly at a milestone.
A long project is really a tree of decisions: one goal, a fork with a few options, you pick one, that becomes the new trunk, it forks again. Markdown is a bad shape for reading that β a flattened list loses which branch came from which fork. So the template also keeps the decisions as a tree you can actually look at.
Every real decision the agent makes is appended to DOCS/_raw/decisions.jsonl
(with the user message number it came from, the options that were on the table, which was chosen, and the git commit at that moment). After each turn a hook redraws three views β all pure scripting, zero model tokens:
DOCS/decision_tree.txt the big picture as text: a left "main goal" spine, every
decision branching off it, options fanning out, the picked one
marked, down to a goal-check box. Code-drawn, so the layout is
exact and never shifts.
DOCS/decision_tree/msg_*.svg one small clean picture PER message (renders cleanly because
it's small). Append-only β the folder IS your history.
DOCS/decision_tree_FULL.txt every user message in tree shape, each tagged with the decision
it produced or "(no decision)". The complete timeline.
DOCS/decision_tree_history/ timestamped snapshots of the text views before each redraw.
The text big-picture looks like this:
[ROOT] MAIN GOAL: never lose work across sessions
|
+-- {MSG 3} session recovery ( manual-copy ) <PICKED: bridge-tool> ( ignore )
+-- {MSG 7} template strategy ( memory-only ) <PICKED: governance> ( hybrid )
+-- {MSG 12} fix recall ( instructions ) <PICKED: injection-hooks>
v
[ROOT] EXPECTED FINAL GOAL β |GOAL CHECK| how close are we?
(The tree shows only the messages where a real decision was made β the forks. Every message,
decision or not, is still in DOCS/_raw/user_messages.txt
and in the FULL timeline.)
The picture is for you. But it's also how you direct the agent without ambiguity. Instead of "go back to where we decided that thing," you point at a node:
You: "DEC-003 was the wrong call β roll back to it."
Agent: hooks/rollback_to_decision.ps1 -Id DEC-003 -Apply
β git-reverts to that decision's stored commit, redraws the tree, marks later
decisions superseded. Deterministic β the commit hash is the single source of truth.
You can point by decision id (-Id DEC-003
) or by message number (-Msg 48
) β both resolve to one exact commit, so there's nothing for the agent to guess. (Rollback needs git in the project; the agent always previews before applying.)
<root>/
ββ CLAUDE.md / AGENTS.md the rulebook the agent auto-loads
ββ .claude/ or .codex/ wires up the logging + injection hooks
ββ hooks/
β ββ log_user_message.ps1 saves every message word-for-word (+ numbers them)
β ββ inject_context.ps1 re-injects the spine on start / after compaction
β ββ inject_on_prompt.ps1 injects active rules with every message
β ββ inject_decisions_preedit.ps1 injects active rules right before an edit
β ββ recall.ps1 looks up the past on a look-back message (keyword + semantic)
β ββ embed.py optional local semantic embedder ($0, by-meaning search)
β ββ index_semantic.ps1 incrementally indexes new content for semantic recall
β ββ record_decision.ps1 logs a decision (msg#, options, chosen, git commit)
β ββ render_decision_tree.ps1 draws the text + per-message SVG + FULL timeline
β ββ rollback_to_decision.ps1 "roll back to DEC-X" β git-revert + re-route tree
β ββ goal_convergence.ps1 scores progress vs the ROOT goal
β ββ verify_project_setup.ps1 checks every required file exists
β ββ verify_governance.ps1 checks the rules haven't been gutted
ββ DOCS/
ββ INDEX.md map of all docs + which one wins in a conflict
ββ CURRENT_STATE.md what's verified true right now (+ the E0βE5 legend)
ββ REQUIREMENTS.md testable user needs (REQ-XXX)
ββ DECISIONS.md architecture choices and why (DEC-XXX)
ββ FAILURE_REGISTRY.md recurring bugs + the regression test (FAIL-XXX)
ββ ANTI_DRIFT_PROTOCOL.md short loop, three-strike, no silent fallback
ββ CHANGE_POLICY.md raw request β REQ β evidence β one commit β record
ββ CHANGE_RECORD_TEMPLATE.md
ββ GIT_RUNBOOK.md safe commit / branch / rollback
ββ HANDOVER_RUNBOOK.md zero-context operator guide
ββ STARTUP_MESSAGE.md prompts to paste at session start
ββ BOOTSTRAP_PROMPT.md prompt to install this system into a fresh project
ββ PROJECT_LOG.md append-only history
ββ BUILD_TRACKER.md status board
ββ STATECHART.md optional visual
ββ plans/ changes/ runs/
ββ _raw/user_messages.txt exact word-for-word transcript
The templates remember what you said and decided. They don't map where your code lives. On a big repo that second kind of memory matters too, and graphify already does it well β it builds a queryable graph of your code so the agent looks things up instead of grepping through 200 files.
these templates graphify
βββββββββββββββ ββββββββ
what did we decide / say / try? where is the auth code, what calls it?
a diary a map
Different jobs, no overlap. If you want both, here's how they fit β but a few things trip people up, so they're worth spelling out.
No. They're separate. Nothing in these templates installs or calls graphify, and copying a template in does not pull it in.
template on its own graphify isn't there, nothing happens
+ graphify install the agent starts using the graph during a session
+ graphify hook install the map rebuilds itself on every git commit
graphify only starts doing anything after you run its own commands in a project. It never fires on its own from this repo.
People run these together as one step and then wonder why the map is stale. They're four separate things:
1. install the tool once per laptop, forever uv tool install graphifyy
2. wire it into a project once per project graphify install (or --platform codex)
3. build the first map once per project graphify .
4. auto-refresh the map once per project graphify hook install
Step 4 is the one most people skip, and it's why "install once and it runs itself" is only half true. The tool installs once. But the map doesn't rebuild on its own until you add the post-commit hook in step 4 β until then, every code change leaves it a little more out of date.
So per project it's three quick commands:
graphify install # agent uses the graph
graphify . # build the first map
graphify hook install # rebuild on every commit, then forget about it
Commit the graphify-out/
folder so teammates start with the map already built, and query it whenever you want:
graphify query "what connects auth to the database?"
On a real task the two systems hand off cleanly β the template supplies the rules, graphify supplies the map:
"add rate-limiting to the login route"
ββ template: DEC-004 pnpm only Β· REQ-002 needs an integration test
ββ graphify: login route β AuthService β RateLimiter β Redis
Building the map is two jobs, and only one of them costs anything.
Reading code structure β functions, files, what calls what β runs locally with tree-sitter. That's free; nothing leaves your machine. Understanding meaning (tying docs and PDFs to code, naming concepts, summarizing) is sent to an LLM, and that's the part that costs tokens, because an actual model has to read it.
That split is also why the refresh runs on commit instead of constantly in the background β
each rebuild spends a little on that LLM, so it waits for your commit rather than burning money
while you sleep. You decide when it costs anything. And if you'd rather it cost nothing, point
it at a local model (--backend ollama
) and even the meaning step stays on your machine.
(graphify is a separate project, not affiliated with this repo. The PyPI package is graphifyy
with a double y. Add graphify-out/cost.json
to your .gitignore
.)
For "just map my code so the agent finds things fast," the free structural map is enough on its
own. It already answers the questions you actually ask: where is UserService
defined, what
calls login()
, what does auth.ts
import, what breaks if I change this.
The reason the paid meaning layer is mostly redundant is simple β your coding agent is already a model. It reads the structural map and works out the meaning itself, on the fly. Paying a second LLM up front to pre-chew that is doing a job your agent does for free as it goes.
What you give up by skipping it: understanding non-code files like PDFs and design docs, inferred conceptual links that aren't written literally in the code, nicely-named clusters, and the "why" pulled out of comments. All nice to have, none of it needed to navigate code. It earns its keep when you've got a lot of docs to tie to the code, or a huge repo where the connections aren't obvious, or you're onboarding people who need the reasoning. Otherwise: structural map plus a capable agent is plenty.
MIT. Use it, fork it, ship it.