Others build agent memory, and what I took from each Falconer engineer Ben introduced agent signals to solve the problem of AI agents starting every conversation from zero, unable to remember a user's role, preferences, or communication style. The system automatically extracts durable self-statements from user conversations every six hours and allows manual signal entry, then injects all signals into the system prompt at the start of each conversation. This approach ensures agents answer questions with appropriate context, format responses according to user preferences, and avoid disliked stylistic choices without requiring repeated corrections. Back to Notes /notes How others build agent memory, and what I took from each Starting from zero, every time An engineer at Falconer asks our agent what’s safe to change to ship a new payments retry path. They’ve owned this code for two years. The agent answers like they’re new to the codebase. It explains what the orchestration layer is, where the idempotency keys live, the basics of the retry queue. None of that was useful. A different user prefers tight bullet lists for meeting summaries. The agent returns three paragraphs of flowing prose. Another user hates em dashes in writing. Every draft the agent produces is laced with them. None of this is a knowledge problem. The agent can look things up. It’s an identity and style problem the agent has no way to solve from a single conversation. By the time the user has corrected the tone or the framing three or four times, the conversation ends and the next one starts cold. Every interaction starts at zero. That’s the failure mode agent signals was built to fix. An agent signal is a short, durable, self-stated thing about the user. Their role. Their team. What they own. How they prefer to communicate. How they like things written. The signals get pulled into the system prompt at the start of every agent conversation, so the agent can write with that context baked in. Same questions, different answers. The codebase question gets an answer that assumes you know the codebase. The meeting summary comes back as bullets. The draft doesn’t have em dashes. That’s the whole feature. The substance is everything underneath: how signals get in, what counts as a signal in the first place, and what to do when there are too many to fit. The first two are about prompt design and ingestion plumbing. The last one is where the design choices get interesting, because it turns out every production AI memory system has solved it differently, and the differences matter. Two write paths, one read path There are two write paths for agent signals in Falconer, and one read path that gathers them up at conversation start. Automatic extraction Every six hours, a background job scans every conversation flagged for extraction. For each user, it loads the user’s messages from those conversations assistant turns and tool output are ignored , passes them to a deliberately restrictive LLM prompt, and gets back structured actions: create a new signal, update an existing one, or skip . Returning zero signals from a conversation is the expected outcome, not the exception. The prompt aggressively rejects anything that isn’t a durable self-statement. Task context like “currently debugging payments” gets skipped because it varies across conversations. Org-level facts like “Falconer uses Postgres” get skipped because they apply to everyone in the org. Neither is a signal about who the user is . Manual entry Users can add or edit signals directly in a settings page. This path matters more than it sounds. The user is the ground truth about themselves; the extraction LLM is a guess. If the system ever silently overwrites or contradicts something a user typed in, users stop trusting what’s stored about them. So manual signals get treated as sacred. Foreshadowing the tiering design later: they live in their own bucket and the system promises never to consolidate or drop them. The read path At the start of every agent conversation, I pull all of the user’s signals and render them as a bullet block under About this user in the system prompt. No retrieval step. No semantic search. The full list goes in. That last part deserves explaining, because the obvious instinct is “this should be RAG.” It shouldn’t. Tens of signals per user, not thousands. The math says inject everything. The model is better at deciding which signals are relevant to the current turn than any retrieval scheme I’d build. This was the first design decision I made, and the reason I made it was that ChatGPT and Claude Code had both arrived at the same conclusion, independently, at much larger scale. That’s what got me to read everything I could find about how they actually work. What others have figured out Three production AI memory systems mattered most to the design: OpenAI’s ChatGPT memory, Anthropic’s Claude Code memory, and the MemGPT / Letta core-memory architecture. Each has a different shape, and each makes a different bet about what’s worth solving in engineering and what’s worth handing to the LLM. ChatGPT memory OpenAI’s memory feature stores explicit memories as flat, timestamped one-liners. No categories. No tags. On top of that, ChatGPT periodically generates “User Knowledge Memories”, AI-summarized dense paragraphs about the user that get regenerated when raw memory grows beyond some size. The AI-summarized dossier gets injected into every conversation, alongside any explicit one-liner memories the user has saved. Neither layer relies on retrieval. The most useful reverse-engineering I read on this was Simon Willison’s I really don’t like ChatGPT’s new memory dossier https://simonwillison.net/2025/May/21/chatgpt-new-memory/ , which pulls apart the actual injected prompt and shows what the dossier looks like at the token level. Shlok Khemani’s is the better piece on https://www.shloked.com/writing/chatgpt-memory-bitter-lesson ChatGPT Memory and the Bitter Lesson why OpenAI built it this way. The argument is that clever retrieval consistently loses to brute-force injection plus periodic re-summarization, and ChatGPT’s design is a bet on that asymmetry. Deduplication in ChatGPT isn’t an engineering problem at all. It’s solved by the periodic summarization rewriting the dossier from scratch. If a user has five variants of “prefers concise writing” in their raw memory, the next summary regen collapses them. No similarity threshold, no embedding distance, no merge logic. The LLM does the work that infrastructure would do in a more traditional system. What I took:flat-string storage with no categories, full injection instead of retrieval, and the bias toward making the LLM do dedup-flavored work instead of writing similarity code. Claude Code memory Claude Code’s memory model looks superficially like ChatGPT’s. Plain text, no database, no vectors. But the structure is meaningfully different and the difference is what made it useful as a reference. There are two separate memory systems sitting side by side, with different writers, different loading strategies, and different lifecycles. The first is CLAUDE.md files . These are human-authored. They live in the filesystem at four different scopes: a managed system-wide policy file, a per-user file in ~/.claude/ , a per-project file in ./CLAUDE.md , and a local override file that’s gitignored. Each scope determines load order. Managed loads first, then user, then project, then local, with later scopes overriding earlier. The scopes also determine version-control behavior. The project file is committed, the local file is not. Every CLAUDE.md file in scope is loaded in full at the start of every session. No index, no on-demand reads. The second is auto memory . Auto memory is agent-authored. Claude writes notes for itself in ~/.claude/projects/