Your AI agent forgets users the moment a session closes. They call back tomorrow and have to re-explain who they are, what they need, and what you discussed before. That’s not a minor UX annoyance — it’s a blocker for any agent you want running in production. Amazon Bedrock AgentCore Memory is the managed answer. With the AgentCore platform reaching full general availability at AWS Summit New York this week, it’s worth understanding what Memory actually does and whether it belongs in your stack.
A Two-Tier Architecture You Need to Understand #
AgentCore Memory runs two distinct layers, and the split between them is the most important thing to grasp before writing a single line of code.
Short-term memory is synchronous and session-scoped. You call create_event()
to write raw messages — user input, assistant responses, tool calls — and they’re available immediately within that session. Think of it as the agent’s working memory: fast, current, and time-bounded. You set retention anywhere from 7 days to one year.
Long-term memory is asynchronous and cross-session. After events land in short-term memory, an extraction process runs in the background. An LLM analyzes what happened and distills it into structured insights that persist indefinitely. This is where the agent builds an actual model of the user over time.
The async gap matters. There is a delay between storing an event and having the extracted insight available in long-term memory. If you’re building flows that assume instant extraction, you’ll hit edge cases. Design with the lag in mind.
Three Strategies That Decide What Your Agent Retains #
Long-term memory is not one thing. AWS built three extraction strategies, and each one produces a different kind of knowledge from the same raw events.
Summary strategy builds rolling conversation summaries organized by topic. The output is XML-tagged, which keeps multiple topics separable. Useful for maintaining conversational continuity — the agent knows the arc of your relationship, not just isolated facts.
Semantic strategy extracts factual statements and stores them with vector embeddings for similarity search. If a user mentions they run a bakery, that becomes a retrievable fact. If they say they’re building with React, that goes in too. It’s the agent building a knowledge model of the user’s world.
User preferences strategy tracks behavioral signals: communication style, tool preferences, workflow choices. The agent learns that this user wants concise answers, prefers Python examples, and hates being asked to confirm things twice.
All three strategies run in parallel on the same short-term events. Pick the ones your use case actually needs — you are billed per record processed.
Add AgentCore Memory to an Agent in Three Steps #
Install the SDK and create a memory store with your chosen strategies:
pip install bedrock-agentcore
python
from bedrock_agentcore.memory import MemoryClient
client = MemoryClient(region_name="us-east-1")
memory = client.create_memory(
name="SupportAgentMemory",
strategies=[
{"type": "SUMMARIZATION"},
{"type": "USER_PREFERENCE"},
{"type": "SEMANTIC"},
]
)
memory_id = memory["id"]
Store events as conversations happen. Two IDs to get right: actor_id
identifies the human user across all sessions; session_id
identifies the specific conversation. Mixing these up is the most common implementation mistake.
client.create_event(
memory_id=memory_id,
actor_id="user_84", # persists across sessions
session_id="session_001", # scoped to this conversation
messages=[
("I am having trouble with order #12345", "USER"),
("Let me look that up for you.", "ASSISTANT"),
("lookup_order(order_id='12345')", "TOOL"),
("Order delayed 3 days due to supply issues.", "TOOL_RESULT"),
]
)
At the start of the next session, retrieve what the agent knows before responding:
response = client.retrieve_memories(
memory_id=memory_id,
actor_id="user_84",
query="previous order issues and user preferences",
max_results=5
)
The official AgentCore Memory documentation covers advanced retrieval patterns including relevance score thresholds and namespace-scoped configs. The open-source Python SDK is on GitHub.
Which Memory Stack Should You Use? #
AgentCore Memory is not the only option. Here is an honest comparison:
| Solution | Best For | Trade-Off |
|---|---|---|
| AgentCore Memory | AWS-native teams, compliance requirements | AWS lock-in, async extraction lag |
If you are already on AWS and your agents run on Bedrock, AgentCore Memory is the path of least resistance. The IAM integration, CloudTrail audit logging, and VPC support matter in regulated environments. For everything else, Mem0 is the community default with a usable free tier and a managed option if you do not want the ops overhead.
Skip AgentCore Memory if you need real-time extraction (the async lag is real), or if you need to model entity relationships across users. That is Zep’s domain, and AgentCore Memory does not try to compete there.
The Cost Trap Nobody Mentions #
Long-term memory records accumulate silently. You store events, extraction runs, records build up — and if you are not setting retention policies or TTLs, costs compound as your user base grows. The AgentCore pricing breakdown shows two components consistently drive 75–85% of Memory spend: retrieval call volume and long-term record accumulation at scale.
Audit your retention strategy before you hit production. Expire long-term records for users inactive for 90 days. It’s an easy win that keeps costs predictable.
Ship It #
Persistent agent memory is no longer a research problem. AgentCore Memory, Mem0, Zep, and LangMem all ship production-ready solutions today. The AWS long-term memory deep dive and the multi-agent platform tutorial are the best starting points if you are going the AgentCore route.
The agents that win in production are not the smartest ones. They are the ones that remember.