Elastic open-sourced Atlas, a system built on Elasticsearch that maintains three categories of memory for agents. Atlas integrates with agents via MCP and maintains per-user isolation of memories. When evaluated on question-answering capability, it scored 0.89 Recall@10.
Atlas is a solution to the problem of identifying the proper context data to add to an agent's LLM prompt when dealing with users that have a long history of interacting with the agent. the entire interaction history isn't a scalable solution, according to Elastic:
The standard workaround is to stuff prior context into the context window. That breaks down on cost, on latency, and on the well-documented "lost in the middle" effect, where models ignore facts placed far from the prompt's edges. A 1M-token context window is a scratchpad. It is not a memory system...What is missing is long-term memory: a persistent store that survives session end, scales to years of interaction, and lets you retrieve facts by content, by time, and by user.
The key concept in Atlas is that there are three types of memory identified by cognitive science: episodic, which captures "what happened;" semantic, "what's true;" and procedural, "what works." Atlas maintains separate Elasticsearch indices for each type of memory, since each type has its own rules and lifecycle.
Memories are created by storing each user input as an episodic memory event. These mostly decay out of memory, although some "become evidence for durable facts." This is done by asking an LLM to consolidate them. The LLM will identify new facts or semantic memories and store each as a short sentence, along with supporting episodic memories as evidence, as well as any previous facts that the new fact supersedes.
Consolidation also updates procedural memory in two ways. First, by creating new "playbooks," which are a series of steps to solve a problem. It also updates success and failure counters for existing playbooks. These counts can bias the retrieval results to boost playbooks that are more successful.
Agents access memories via a single hybrid query across all these indices that uses Reciprocal Rank Fusion (RRF) over BM25 lexical search plus Jina v5 semantic search; the merged results are re-ranked using a cross-encoder reranker. Document-level security (DLS) ensures that queries only search memory documents belonging to that user.
In a discussion about Atlas on Hacker News, some users wondered if using Elasticsearch as the storage was "overkill," and suggested other vector-capable databases such as SQLite. Another user replied:
"Any other vector DB" starts to fall apart once you need stuff like scripted scoring... Then it starts to be a question of, "do you need [Approximate Nearest Neighbor] for performance?"...And granted, brute-force is performant for far more vectors than most people give it credit for, but it definitely hits a wall well below 1 million if you want it to have webpage-type latency. Maintaining Elasticsearch isn't free, but picking an underpowered db and having to port to the right one is also quite time consuming.
The Atlas source code is available on GitHub.