When Your Coding Agent Needs a Scribe, Not a Memory Engine

wpnews.pro

Over the past few weeks, I have had several conversations about the right way to give AI coding agents persistent memory.

Some developers ask about AgentMemory. Others ask about Qiju, which I built and maintain.

My usual response is: they are not solving the same problem. But I have not written that distinction down clearly.

This post is my attempt to do that.

The same surface problem

Both tools address the same frustration.

Every new coding session starts without context. The agent does not know which file is authoritative. It does not know what was decided in the previous session. It does not know which approaches have already failed or which assumptions have been verified.

The developer spends the opening minutes of every session reconstructing what the previous session established.

Both tools try to reduce that cost. But they approach it differently, and the difference matters in practice.

What AgentMemory does

AgentMemory runs as a local server and hooks into the lifecycle events of your coding session automatically.

With Claude Code, Codex, or another supported agent, it captures:

every prompt submitted;

every tool call the agent makes;

every result the agent receives;

session start and session end.

Those observations are compressed using a language model, embedded using a local or cloud embedding model, and indexed with BM25 and vector search. At the start of the next session, the most relevant memories are retrieved automatically and injected into the conversation. You do not need to do anything.

AgentMemory has published reproducible benchmark results: 95.2% recall accuracy on LongMemEval-S. Semantic retrieval can find "database performance" when you actually wrote "N+1 query fix." Keyword search cannot do that.

What Qiju does

Qiju does not capture anything automatically.

When I want a record to persist, I or my agent intentionally invokes /qiju-log

inside Claude Code, or $qiju-log

inside Codex.

That produces a structured handoff record:

title:         Selected token refresh strategy
tags:          auth, architecture
search_terms:  refresh token, 401 retry
next_steps:    implement bounded retry, add expiry regression test
body:          Decision, evidence, rejected alternatives, and handoff notes.

The record is stored in plain JSONL files on the developer's machine. Nothing is embedded. Nothing leaves the machine. Retrieval is deterministic keyword and tag search.

In the next session, the agent searches for what it needs. The developer sees exactly what was recorded and why.

The real distinction

It is tempting to describe AgentMemory as "automatic Qiju," or Qiju as "manual AgentMemory." That is not quite accurate.

The two tools are answering different questions.

AgentMemory asks: what did the agent observe during previous sessions?

Qiju asks: what did the developer judge worth preserving about this project?

An internal log of every tool call is not the same as a structured record of why a decision was made, what evidence supported it, and which approaches were already rejected.

A memory engine accumulates what happened.
A record layer captures what matters.

Why the distinction is practical

Imagine a session in which I investigated an authentication bug, tried three approaches, found that the second one caused a regression, and decided to roll back the change and record a specific boundary condition in the architecture notes.

An automatic memory capture would produce:

observations of every file read;

observations of every command run;

observations of every test failure;

a compressed session summary.

That is useful background.

A deliberate Qiju record would produce:

the specific file that is the current ground truth;

the decision and the reason for it;

the two rejected approaches and why they failed;

the next step for the following agent.

The next agent does not need the full observational trace. It needs to know what was decided, where to find the evidence, and what not to repeat.

Those are different artefacts.

A concrete example

In the post I wrote about Codex's SQLite logging problem, I described this kind of record:

Ground truth:  design/pypi-packaging-execution-plan.md
Decision:      Use one canonical src/qiju package for source and PyPI installs.
Verified:      Wheel and sdist passed artifact inspection and installed-wheel smoke tests.
Rejected:      Renaming scripts to qiju only during the staging build.
Next:          Publish to TestPyPI and repeat the clean-install lifecycle test.

A complete session transcript might contain tens of thousands of words around those conclusions. The durable project record is much smaller.

The next agent does not need the whole session. It needs the right record.

The operational differences

Beyond philosophy, the two tools make different operational assumptions.

AgentMemory requires:

a Rust binary (iii-engine) running as a background daemon;

Node.js installed;

an embedding model, either local or from a configured API provider.

If an external embedding provider is configured, content leaves the machine. The local all-MiniLM-L6-v2

model avoids this but adds setup steps. AgentMemory supports over twenty agents including Claude Code, Codex, Cursor, Cline, Gemini CLI, and others. It runs on macOS, Linux, and Windows.

Qiju requires:

Python 3.11 or later;

nothing else.

Records stay in plain JSONL files on the developer's machine. They can be committed to git alongside the code. Qiju supports Claude Code, Codex, Kiro, and Cursor. It runs on macOS and Linux. Windows is not yet supported.

Where they compose

I do not think of these tools as competitors.

A team could run AgentMemory for automatic background capture — what the agent did, when, and in which files — and run Qiju for the intentional layer — what was decided and why.

AgentMemory recalls what the agent did.

Qiju records what the developer decided.

That is a natural division of responsibility.

Current limits of both

AgentMemory's automation is valuable precisely because most developers will not log anything if the effort is nonzero. But automatic capture at PostToolUse granularity produces high volume. The compression step reduces it, but the signal-to-noise ratio depends on how well the LLM identifies what was important. The daemon and embedding pipeline add operational complexity.

Qiju's intentional capture is a feature to some developers and a friction point to others. If you forget to log something, it is not recorded — though a Stop hook in your agent configuration can instruct the agent to run qiju-log

before the session closes, which reduces the chance of an unrecorded session. There is no semantic search. Retrieval depends on using good keywords and tags at log time. The record is only as good as the effort put into writing it.

Neither tool solves all the problems. Both have been honest about their current limits.

My conclusion

The question of agent memory is not settled. The tooling is early and changing quickly.

If zero-friction automatic capture matters most, and you are comfortable with the operational requirements, AgentMemory is worth exploring.

If deliberate, auditable, file-based records matter most, and you want something with no background services and no external dependencies, Qiju fits that model better.

If both matter — the tools compose.

source & further reading

dev.to — original article Ollama's Chinese Model Support Is Real — But Running Kimi and DeepSeek Locally Has a Hidden Cost The Day My Research Assistant Finally Got a Memory The Rise of GCCs in India: Why Global Companies Are Hiring Directly While Traditional IT Struggles

When Your Coding Agent Needs a Scribe, Not a Memory Engine

Run your AI side-project on zahid.host