How to improve the memory of AI agents

wpnews.pro

cd /news/large-language-models/how-to-improve-the-memory-of-ai-agen… · home › topics › large-language-models › article

[ARTICLE · art-46267] src=infoworld.com ↗ pub=2026-07-01T09:00Z topic=large-language-models verified=true sentiment=↑ positive

How to improve the memory of AI agents

AI agents suffer from limited memory due to the stateless nature of large language models, but retrieval-augmented generation (RAG) offers a solution by offloading long-term memory to external storage. RAG provides three types of memory—episodic, semantic, and procedural—that allow agents to persist and retrieve contextual data, improving performance and reducing glitches.

read4 min views1 publishedJul 1, 2026

When you use an AI agent, the more contextual data the agent has about the job, the better it will perform.

But agents don’t have much memory, since the large language models (LLMs) they depend on are stateless. When their memory runs out, the agent glitches out, hangs up, or spews out nonsense. Tactics like truncating or compacting agent memory can make up for this, but they’re not real solutions.

A better answer to the AI agent memory crunch is memory that lives and persists outside of the agent itself. The agent’s memory is still used for immediate work, but the longer-term, big-picture details get offloaded to another service and retrieved on demand.

The term for this is retrieval-augmented generation, or RAG. It has become as significant a technology as the agents and LLMs themselves, as it expands their capabilities in-place.

LLMs have what’s called a “context window” — a block of working memory up to a certain size that’s used for processing input. The maximum size of the window varies depending on the model. The more memory devoted to the context window, the more information the model can process (e.g., a file containing code for analysis), and the more complex the conversation it can sustain.

The premise behind RAG is simple: Use the LLM’s context window for information that matters in the immediate conversation, and use persistent storage systems (RAG) for information outside of that. The model’s context window serves as short-term memory, and RAG serves as long-term memory.

What’s more, RAG storage comes in a few different forms. A 2024 paper entitled “Cognitive Architectures for Language Agents” goes into great detail about them, but it’s worth breaking them down in plainer language.

Let’s examine the three basic ways RAG storage works: episodic memory, semantic memory, and procedural memory.

Episodic memory stores data generated from some previous point in time by the LLM — a decision the LLM made, and the result of that decision. These experiences can be ordered by time to produce what the above paper describes as “history event flows”, or the processes that generated some particular output. Through episodic memory, the LLM can reconstruct a decision or process it previously performed, and use that experience to guide future action.

Semantic memory stores structured data “about the world and [the agent] itself”, as the paper puts it. This could be as simple as using a basic key/value store for user preferences, or could involve a more complex system like vector embedding. The point is to give the agent a way to look up such “world knowledge” readily, and to have it available in a format the agent can use as-is.

It also helps for semantic memory to be controllable. As the paper notes, an external source like Wikipedia is “an external environment that may be unexpectedly modified by other users,” but an offline version (essentially, a static point-in-time snapshot) would not have this problem.

On the surface, procedural memory sounds a little like episodic memory: it’s used to store things like reasoning processes or learning procedures. But procedural memory is specifically for allowing the LLM to reproduce the steps of a process, rather than the mere fact that it followed such a process. It allows those procedures to be performed repeatedly without having to be re-discovered or re-created from scratch each time.

An important thing about each of these kinds of memories: they favor reads over writes. For instance, semantic memory isn’t written to very often, though it can be useful for the agent to record new facts it learns about its world. By contrast, letting the agent write freely to procedural memory might “introduce bugs or allow an agent to subvert its designers’ intentions”, as the paper notes.

While RAG itself is a standard on the agent’s side, there’s no one canonical way to implement RAG storage. The storage layer is typically a vector database, although many modern databases support vector functionality. Also, where that memory lives can be more open-ended. A service that provides access to an LLM, for instance, could include RAG on the server side as part of its package of offerings. A locally-run LLM could have RAG storage services running side-by-side on the same system that hosts the model. The downside of this last approach is that the system will require that much more local storage and processing power.

RAG storage also requires its own separate upkeep. Each agent and use case will impose different demands on how to manage that storage. Older data, for instance, might need to be aged out periodically, or given less weight than newer or more frequently accessed data.

Finally, while multiple agents can share the same RAG storage, they shouldn’t do so indiscriminately. At the very least, each agent should operate in its own context so that data and use cases from one agent don’t interfere with others. A more complex and ambitious approach is to use a tool like Microsoft AutoGen to build shared multi-agent RAG contexts.

source & further reading

infoworld.com — original article Preventing agent-generated infrastructure bloat through spec-driven governance A better way to manage LLM spending Microsoft MCP server gives AI assistants access to MSBuild logs

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-to-improve-the-memor…

Read original on infoworld.com → www.infoworld.com/article/4189492/how-to-improve…

mentioned entities

retrieval-augmented generation

RAG

large language models

LLMs

context window

episodic memory

semantic memory

procedural memory

metadata

slughow-to-improve-the-memory-of-ai-agents

topic#large-language-models

secondary3 topics

sentimentpositive

canonicalinfoworld.com

navigation

← prevHarriette Cole: I ask for a litt…

next →Learn PHP with Claude in 2026: b…

── more in #large-language-models 4 stories · sorted by recency

machinebrief.com · 1 Jul · #large-language-models

Unmasking the Mandate Salience Decay in Financial AI

dev.to · 29 Jun · #large-language-models

The cost of learning everyting

dev.to · 1 Jul · #large-language-models

The Siren

dev.to · 1 Jul · #large-language-models

The invisible characters in your prompts aren't a conspiracy — they're a warning about your trust boundary

── more on @retrieval-augmented generation 3 stories trending now

wpnews · 30 May · #ai-tools

I was wasting 10 minutes every Claude session. So I built a fix.

wpnews · 27 May · #machine-learning

hunting for headroom on modded-nanoGPT (WR #82)

wpnews · 2 Jun · #ai-products

Microsoft launches Discovery platform for scientific R&D with Ginkgo Bioworks partnership

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required