{"slug": "how-to-improve-the-memory-of-ai-agents", "title": "How to improve the memory of AI agents", "summary": "AI agents suffer from limited memory due to the stateless nature of large language models, but retrieval-augmented generation (RAG) offers a solution by offloading long-term memory to external storage. RAG provides three types of memory—episodic, semantic, and procedural—that allow agents to persist and retrieve contextual data, improving performance and reducing glitches.", "body_md": "When you use an AI agent, the more contextual data the agent has about the job, the better it will perform.\n\nBut agents don’t have much memory, since the [large language models](https://www.infoworld.com/article/2335213/large-language-models-the-foundations-of-generative-ai.html) (LLMs) they depend on are stateless. When their memory runs out, the agent glitches out, hangs up, or spews out nonsense. Tactics like truncating or compacting agent memory can make up for this, but they’re not real solutions.\n\nA better answer to the AI agent memory crunch is memory that lives and persists outside of the agent itself. The agent’s memory is still used for immediate work, but the longer-term, big-picture details get offloaded to another service and retrieved on demand.\n\nThe term for this is [retrieval-augmented generation](https://www.infoworld.com/article/2335814/what-is-retrieval-augmented-generation-more-accurate-and-reliable-llms.html), or RAG. It has become as significant a technology as the agents and LLMs themselves, as it expands their capabilities in-place.\n\nLLMs have what’s called a “context window” — a block of working memory up to a certain size that’s used for processing input. The maximum size of the window varies depending on the model. The more memory devoted to the context window, the more information the model can process (e.g., a file containing code for analysis), and the more complex the conversation it can sustain.\n\nThe premise behind RAG is simple: Use the LLM’s context window for information that matters in the immediate conversation, and use persistent storage systems (RAG) for information outside of that. The model’s context window serves as short-term memory, and RAG serves as long-term memory.\n\nWhat’s more, RAG storage comes in a few different forms. A 2024 paper entitled [“Cognitive Architectures for Language Agents”](https://arxiv.org/abs/2309.02427) goes into great detail about them, but it’s worth breaking them down in plainer language.\n\nLet’s examine the three basic ways RAG storage works: episodic memory, semantic memory, and procedural memory.\n\nEpisodic memory stores data generated from some previous point in time by the LLM — a decision the LLM made, and the result of that decision. These experiences can be ordered by time to produce what the above paper describes as “history event flows”, or the processes that generated some particular output. Through episodic memory, the LLM can reconstruct a decision or process it previously performed, and use that experience to guide future action.\n\nSemantic memory stores structured data “about the world and [the agent] itself”, as the paper puts it. This could be as simple as using a basic key/value store for user preferences, or could involve a more complex system like [vector embedding](https://www.ibm.com/think/topics/vector-embedding). The point is to give the agent a way to look up such “world knowledge” readily, and to have it available in a format the agent can use as-is.\n\nIt also helps for semantic memory to be controllable. As the paper notes, an external source like Wikipedia is “an external environment that may be unexpectedly modified by other users,” but an offline version (essentially, a static point-in-time snapshot) would not have this problem.\n\nOn the surface, procedural memory sounds a little like episodic memory: it’s used to store things like reasoning processes or learning procedures. But procedural memory is specifically for allowing the LLM to reproduce the steps of a process, rather than the mere fact that it followed such a process. It allows those procedures to be performed repeatedly without having to be re-discovered or re-created from scratch each time.\n\nAn important thing about each of these kinds of memories: they favor reads over writes. For instance, semantic memory isn’t written to very often, though it can be useful for the agent to record new facts it learns about its world. By contrast, letting the agent write freely to procedural memory might “introduce bugs or allow an agent to subvert its designers’ intentions”, as the paper notes.\n\nWhile RAG itself is a standard on the agent’s side, there’s no one canonical way to implement RAG storage. The storage layer is typically a vector database, although [many modern databases support vector functionality](https://www.infoworld.com/article/4169087/your-ai-doesnt-need-another-database.html).\n\nAlso, where that memory lives can be more open-ended. A service that provides access to an LLM, for instance, could include RAG on the server side as part of its package of offerings. A locally-run LLM could have RAG storage services running side-by-side on the same system that hosts the model. The downside of this last approach is that the system will require that much more local storage and processing power.\n\nRAG storage also requires its own separate upkeep. Each agent and use case will impose different demands on how to manage that storage. Older data, for instance, might need to be aged out periodically, or given less weight than newer or more frequently accessed data.\n\nFinally, while multiple agents can share the same RAG storage, they shouldn’t do so indiscriminately. At the very least, each agent should operate in its own context so that data and use cases from one agent don’t interfere with others. A more complex and ambitious approach is to use a tool like [Microsoft AutoGen](https://www.microsoft.com/en-us/research/project/autogen) to build shared multi-agent RAG contexts.", "url": "https://wpnews.pro/news/how-to-improve-the-memory-of-ai-agents", "canonical_source": "https://www.infoworld.com/article/4189492/how-to-improve-the-memory-of-ai-agents.html", "published_at": "2026-07-01 09:00:00+00:00", "updated_at": "2026-07-01 09:24:54.826360+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-research", "generative-ai"], "entities": ["retrieval-augmented generation", "RAG", "large language models", "LLMs", "context window", "episodic memory", "semantic memory", "procedural memory"], "alternates": {"html": "https://wpnews.pro/news/how-to-improve-the-memory-of-ai-agents", "markdown": "https://wpnews.pro/news/how-to-improve-the-memory-of-ai-agents.md", "text": "https://wpnews.pro/news/how-to-improve-the-memory-of-ai-agents.txt", "jsonld": "https://wpnews.pro/news/how-to-improve-the-memory-of-ai-agents.jsonld"}}