At a glance #
- Today’s AI agents don’t remember past interactions. They must repeatedly be fed relevant information or retrieve it from external sources, which becomes less efficient as they handle longer and more complex tasks. To scale agent capabilities, we need a more efficient way to retain and access information over time. Memora is a scalable memory system that dramatically increases agent productivity on long-horizon tasks by decouplingis stored (rich memory content) fromwhat it’s retrieved (lightweight abstractions and cue anchors), balancing abstraction and specificity.how- Memora sets new state-of-the-art on LoCoMo and LongMemEval, outperforming Mem0, RAG, and full-context inference while using up to 98% fewer context tokens. Memora paper (opens in new tab)is published at ICML 2026. Memora code is available athttps://github.com/microsoft/Memora (opens in new tab).
Imagine a workplace AI assistant helping you run a multi-month project. Over weeks of conversations, you share constraints, agree on milestones, revise deadlines, and surface dozens of stakeholder preferences. When you later ask it to draft an update for a colleague, it should recall not just the latest decision but the journey that got you there: what was tried, what was ruled out, who weighed in. Today’s AI agents struggle with this. Modern large language models (LLMs) are powerful reasoners, but they are effectively stateless: every session starts from zero, every long conversation forces the model to re-read its entire history, and every new piece of information is either stored as raw text (fragmented and noisy) or compressed into a vague summary (precise details lost). As AI assistants and autonomous agents move into long-horizon deployments, such as copilots that tracks a project for many months or even research agents that build up domain expertise with long horizon usage, the absence of principled memory system has become the critical bottleneck.
A growing line of work has begun to fill this gap. Systems like Mem0 extract atomic facts from conversations; retrieval-augmented (RAG) approaches index raw text fragments for later recall; and graph-based memory systems such as Zep and GraphRAG impose structure through entity relations. Each represents real progress, yet each runs into the same wall: existing designs force an unavoidable tradeoff between specificity (preserving fine-grained detail) and abstraction (organizing memory efficiently as it grows). Memora is built to give agents both.
What is Memora #
Memora is an agentic memory framework designed for long-horizon AI agents. Memora’s central insight is to decouple what is stored from how it is retrieved. Memory content can remain rich and expressive, such as a project timeline, a multi-turn discussion about constraints, while a separate, lightweight
structurallayer handles indexing and retrieval. The result is a memory system that scales: it consolidates related information into stable units, surfaces fine-grained details when they matter, and lets the agent navigate its own history without re-reading everything. On standard long-conversation benchmarks, Memora sets new state-of-the-art performance while using up to 98% fewer tokens than would be consumed by dumping the full history into context.
Why this is hard: the abstraction–specificity tension
Existing memory systems fall into two extremes. Content-fragmentation systems, such as RAG and Mem0, embed extracted facts or text fragments directly. This preserves detail but produces brittle, isolated entries that lose narrative coherence. Coarse-abstraction systems compress experience into compact summaries. They are efficient, but summarization strips away the constraints, edge cases, and numeric details that make memory useful in the first place. Graph-based systems add structure on top of content, yet still rely on the content itself for retrieval and typically require rigid ontologies that don’t generalize across domains. None of these resolves the underlying tension between abstraction (which keeps memory efficient) and specificity (which gives memory utility).
How Memora works #
Memora resolves this tension through a harmonic organization. Each memory entry has two components: a primary abstraction, which a short phrase (6–8 words) that captures what the memory is fundamentally about, and a memory value holding the rich content itself. Crucially, only the primary abstraction is embedded for similarity search; the value is never directly retrieved through its own content. This separation means new information about an evolving topic merges into the existing memory entry under the same primary abstraction, rather than fragmenting into a chain of partial duplicates. Complementing primary abstractions, cue anchors are short, context-aware tags extracted from each memory’s value, providing alternative access paths to the same memory. They function as flexible, organically-generated metadata.
To make this concrete: suppose a user says, “Dave and Sarah agreed to push the prototype to April 1, the pilot to May 2, and the MVP to May 30.” A knowledge-graph system would need predefined entity types and relation schemas: Person → agreed_on → Milestone → has_date → Date, and any new relation type would require schema extension. In Memora, the primary abstraction Updated Project Orion timeline agreed by Dave and Sarah serves as the canonical access point, while cue anchors like Dave Project Orion update, Project Orion prototype schedule, and Project Orion pilot timeline provide alternative retrieval paths — all without committing to an ontology. A later query about Dave’s recent contributions, or the prototype schedule, or pilot timing can all route to the same underlying memory through different cues, with the full detail preserved in the memory value.
On top of this representation, Memora introduces a policy-guided retriever that treats memory access as an active reasoning process. Rather than returning the top-k semantically similar items in a single shot, the policy retriever iteratively refines its query, expands through cue anchors to surface related-but-not-similar memories, and decides when to stop. This lets the agent navigate to relevant non-local context that pure semantic search would miss, chasing multi-hop dependencies the way a human would when recalling connected events. The retrieval policy can be either hand-prompted with a strong LLM or distilled into a much smaller model via reinforcement learning.
PODCAST SERIES
Results #
We evaluate Memora on two long-context benchmarks: LoCoMo, where dialogues average 600 turns, and LongMemEval, with 115,000-token contexts. Memora achieves new state-of-the-art performance on both: 86.3% LLM-judge accuracy on LoCoMo and 87.4% on LongMemEval, outperforming RAG, Mem0, Nemori, Zep, LangMem, and even full-context inference. The gap is largest on multi-hop reasoning, where Memora’s ability to traverse cue anchors pays the biggest dividends. The efficiency story is just as striking: Memora stores roughly half the memory entries per conversation that Mem0 does (344 vs. 651) and reduces token consumption by up to 98% relative to full-context inference. Less to read, less to store, better answers.
Looking forward #
Memora’s design has implications beyond benchmark performance. We see this work as a step toward AI agents that can sustain long-term collaboration with users and accumulate organizational knowledge over months and years, not just within a single session. Building on this foundation, we are pursuing several complementary directions. MemLoop explores how memory systems can learn from retrieval and task failures, attribute errors to specific stages of the memory pipeline, and improve themselves over time. Deferred Memory investigates when memory construction should be postponed until sufficient context, evidence, or future utility becomes available, rather than committing prematurely to what should be stored. Group Memory examines how knowledge can be shared across teams and agents while preserving provenance, access boundaries, ownership, and sensitive context. We release our code alongside the paper and invite the community to build on this representation and explore what becomes possible when AI agents are no longer stateless.
Acknowledgements
We would like to thank Shantanu Dixit (Research Fellow) Paramaguru Harimurugan (Research Fellow), Rujia Wang, Victor Rühle, and Robert Sim for contributing to this project.