Build Persistent, Scalable AI Agent Memory with TiDB

TiDB has introduced a unified database approach for AI agent memory that stores text, vectors, and metadata in a single table, eliminating the need for separate vector stores and sync jobs. The system follows a four-step memory loop — storing, embedding, searching, and injecting relevant past interactions into LLM prompts — using TiDB's built-in vector search and HNSW indexing for scalable nearest-neighbor lookups. Developers can implement the entire memory infrastructure on TiDB Cloud's free tier, with the complete schema and queries available in the pingcap/agent-rules repository.

Key Takeaways - Agent memory is an infrastructure pattern, not a model feature: Store the past outside the model and inject the relevant slice into each prompt. - Every memory system reduces to one loop: Store a row, embed it, search for the nearest vectors, and inject the matches. - TiDB holds text, vectors, and metadata in a single table, so filtering and vector or hybrid BM25 search run in one query with no separate vector store to sync. - You can build the whole loop on the free tier of TiDB Cloud Starter, with the full SQL and pytidb code in the pingcap/agent-rules repo . I gave a session at Microsoft Build 2026 on agent memory with TiDB. A few people asked for the code afterward, so here’s a complete write up of the session: The same pattern as the talk, with copy-paste-ready schema and queries. You can watch the original Microsoft Build 2026 session here https://www.youtube.com/watch?v=J0o-Dkt5tnI . Why an Agent Memory Database is an Infrastructure Problem Large language models are stateless. Every API call starts from scratch. Whatever a user told the agent yesterday, their preferences, their last support ticket, the back-and-forth that finally landed on the right answer, all of it is gone the moment the response finishes streaming. Memory is how you close that gap. It is not a model feature, it is an infrastructure pattern. You store the past somewhere outside the model, and on each new turn you pull back the relevant slice and inject it into the prompt. The agent looks like it remembers because you remembered for it. The real engineering question is where you put that memory, and how you find the right piece of it fast. The Four-Step Memory Loop Strip away the framework noise and every agent memory system reduces to the same loop: - Store each thing worth remembering as a row in a table. - Embed the row’s text into a vector that captures its meaning. - Search for the rows whose vectors are closest to the current query. Those are your relevant memories. - Inject those memories into the next LLM prompt. Summarization, fact extraction, and decay scoring all sit on top of this loop. Get the loop right first. Your Agent Memory Database in One Table A memory row needs to do three things at once: Hold the text, hold a vector representation of that text, and hold the metadata that tells you whose memory it is and when it was created. TiDB’s built-in vector search https://docs.pingcap.com/tidbcloud/vector-search-overview/ lets all three live in a single table: CREATE TABLE memories id BIGINT PRIMARY KEY AUTO RANDOM, user id VARCHAR 64 NOT NULL, content TEXT NOT NULL, embedding VECTOR 1024 , created at TIMESTAMP DEFAULT CURRENT TIMESTAMP, INDEX idx user user id , VECTOR INDEX idx embedding VEC COSINE DISTANCE embedding ; A few details worth calling out: VECTOR 1024 is a native column type. No extension, no separate vector store running alongside your database, no sync job between them.- The VECTOR INDEX ... VEC COSINE DISTANCE line builds an HNSW vector index https://docs.pingcap.com/tidbcloud/vector-search-index/ , which keeps nearest-neighbor lookups fast as the table grows. AUTO RANDOM instead of AUTO INCREMENT matters more than people coming from single-node MySQL expect. Sequential integer keys create write hotspots on a distributed system because every insert lands on the same node. Random keys spread inserts across the cluster. Storing a Memory If you already have an embedding pipeline, pass the vector in: INSERT INTO memories user id, content, embedding VALUES 'user 42', 'User prefers window seats on flights longer than 4 hours.', ' 0.0123, -0.0456, ..., 0.0789 ' ; If you would rather not run your own embedding pipeline, TiDB Cloud can generate vectors for you on insert. Define the embedding column as a generated column that calls a hosted model: ALTER TABLE memories MODIFY embedding VECTOR 1024 GENERATED ALWAYS AS EMBED TEXT 'tidbcloud free/amazon/titan-embed-text-v2', content STORED; After that, you insert text and the database produces and stores the vector: INSERT INTO memories user id, content VALUES 'user 42', 'User prefers window seats on long flights.' ; One less moving part to maintain. In my Build session I used OpenAI embeddings; here I switched to the Titan model because TiDB Cloud hosts it for free, so everything in this post runs with no API keys and no credit card. If you prefer OpenAI, Cohere, or Jina embeddings, swap in that model name and bring your own key. Recalling Memories with Vector Search When the agent receives a new message, embed the message and ask the database which stored memories mean something similar: SELECT id, content, VEC COSINE DISTANCE embedding, EMBED TEXT 'tidbcloud free/amazon/titan-embed-text-v2', 'Where should I book his seat?' AS distance FROM memories WHERE user id = 'user 42' ORDER BY distance LIMIT 5; The query about booking a seat returns “User prefers window seats” even though the two sentences share almost no words. That is the embedding doing its job: It matches on meaning, not on exact text. Two things make this query nice in TiDB specifically. First, the WHERE user id = 'user 42' filter runs in the same query as the vector search. There is no two-system dance, no joining results back together in application code. One round trip. Second, your source-of-truth memory write and your retrieval logic live in the same transactional database, which keeps the application model much simpler than syncing a separate vector system. Hybrid Search for When Meaning is Not Enough Pure vector search is strong at concepts and weak at proper nouns. A query like “the issue with order A-9912” will often surface memories about other orders that feel conceptually close but are not the right record. That is where keyword search earns its place. TiDB has BM25 full-text search built in, so you can blend both signals in a single hybrid search https://docs.pingcap.com/tidbcloud/vector-search-hybrid-search/ query: CREATE FULLTEXT INDEX idx content ON memories content ; SELECT id, content, 0.7 VEC COSINE DISTANCE embedding, EMBED TEXT 'tidbcloud free/amazon/titan-embed-text-v2', :query - 0.3 fts match score 'idx content', :query AS hybrid score FROM memories WHERE user id = :user id ORDER BY hybrid score LIMIT 5; The 0.7 and 0.3 are weights you tune for your workload. One gotcha worth flagging: Vector distance is lower-is-better, but BM25 relevance is higher-is-better, which is why the formula subtracts the full-text score instead of adding it. In production RAG, this hybrid setup almost always beats either approach on its own, and you pay for it with one extra index. The Agent Memory Loop in Python with pytidb Most AI developers I work with live in Python, not SQL. The pytidb SDK wraps the same primitives, and with auto-embedding turned on you never touch a vector directly. Insert text. Query with text. The library handles the rest. python from pytidb import TiDBClient from pytidb.embeddings import EmbeddingFunction from pytidb.schema import TableModel, Field from pytidb.datatype import TEXT db = TiDBClient.connect database url="mysql+pymysql://USER:PASS@HOST:4000/test" "?ssl verify cert=true&ssl verify identity=true" Auto-embedding: the embedding column is derived from content server-side. embed = EmbeddingFunction model name="tidbcloud free/amazon/titan-embed-text-v2" class Memory TableModel : id: int = Field primary key=True user id: str content: str = Field sa type=TEXT embedding: list float = embed.VectorField source field="content" memories = db.create table schema=Memory, if exists="overwrite" Store. No vector math here; embeddings are generated automatically. memories.bulk insert Memory user id="user 42", content="User prefers window seats on long flights." , Memory user id="user 42", content="User flies United and Delta only." , Recall. Pass plain text; pytidb embeds the query and ranks by cosine distance. results = memories.search "Where should I book his seat?" .filter {"user id": "user 42"} .limit 5 .to list Three lines to insert a memory, four to retrieve the relevant ones. That is the entire loop. Why an Agent Memory Database, Not a Dedicated Vector Store You can build this on a dedicated vector database, with Postgres for user profiles, S3 for transcripts, and Redis for session state. Many teams start there. When memory lives in the same engine as the rest of your agent’s state, a few things change: - Filtering, sorting, and vector search run in one query plan. WHERE user id = ... AND created at ... ORDER BY executes as a single statement. No application-layer joins between systems. - You get ACID transactions across the whole agent’s state. When the agent writes a new memory, deducts a credit, and logs an event, all three commit together or none of them do. The alternative is debugging partial writes after the fact. - You have one copy of the data. Embeddings live next to the source text. When you change embedding models, you re-embed in place. No sync pipeline drifting out of date. - You get multi-tenancy at real scale. Each user or agent session can have an isolated branch of the database, created in milliseconds with copy-on-write storage. Manus https://www.pingcap.com/case-study/manus-agentic-ai-database-tidb/ runs this pattern in production, creating close to 1 million database tenants within three months on its agent platform. Agent Memory Database: Where to Go Next Everything in this post runs on the free tier of TiDB Cloud Starter https://www.pingcap.com/tidb/cloud/ . You can build the entire memory loop, schema, auto-embedding, vector search, and hybrid retrieval without entering a credit card. The same primitives scale up to production workloads: Pinterest runs 1.5 PB of data at a peak of 8 million QPS on TiDB, and Flipkart benchmarked TiDB as a hot store to 1 million QPS. For metadata-heavy consolidation, Atlassian collapsed 750 PostgreSQL clusters down to 16 on TiDB. If you would rather not hand-roll the memory layer, the open-source mem9 https://mem9.ai/ project sits on top of these same TiDB primitives and provides a memory API with fact extraction, deduping, and decay built in. Same storage underneath, with the memory semantics handled for you. The full code from this post, including the Python version using the pytidb SDK, is in the pingcap/agent-rules https://github.com/pingcap/agent-rules repository. Clone it, point it at a free TiDB Cloud Starter cluster, and you have a working agent memory loop in a few minutes. Experience modern data infrastructure firsthand. TiDB Cloud Dedicated A fully-managed cloud DBaaS for predictable workloads TiDB Cloud Starter A fully-managed cloud DBaaS for auto-scaling workloads