{"slug": "build-persistent-scalable-ai-agent-memory-with-tidb", "title": "Build Persistent, Scalable AI Agent Memory with TiDB", "summary": "TiDB has introduced a unified database approach for AI agent memory that stores text, vectors, and metadata in a single table, eliminating the need for separate vector stores and sync jobs. The system follows a four-step memory loop — storing, embedding, searching, and injecting relevant past interactions into LLM prompts — using TiDB's built-in vector search and HNSW indexing for scalable nearest-neighbor lookups. Developers can implement the entire memory infrastructure on TiDB Cloud's free tier, with the complete schema and queries available in the pingcap/agent-rules repository.", "body_md": "Key Takeaways\n\n- Agent memory is an infrastructure pattern, not a model feature: Store the past outside the model and inject the relevant slice into each prompt.\n- Every memory system reduces to one loop: Store a row, embed it, search for the nearest vectors, and inject the matches.\n- TiDB holds text, vectors, and metadata in a single table, so filtering and vector (or hybrid BM25) search run in one query with no separate vector store to sync.\n- You can build the whole loop on the free tier of TiDB Cloud Starter, with the full SQL and pytidb code in the\n[pingcap/agent-rules repo].\n\nI gave a session at Microsoft Build 2026 on agent memory with TiDB. A few people asked for the code afterward, so here’s a complete write up of the session: The same pattern as the talk, with copy-paste-ready schema and queries.\n\nYou can watch the original Microsoft Build 2026 session [here](https://www.youtube.com/watch?v=J0o-Dkt5tnI).\n\n## Why an Agent Memory Database is an Infrastructure Problem\n\nLarge language models are stateless. Every API call starts from scratch. Whatever a user told the agent yesterday, their preferences, their last support ticket, the back-and-forth that finally landed on the right answer, all of it is gone the moment the response finishes streaming.\n\nMemory is how you close that gap. It is not a model feature, it is an infrastructure pattern. You store the past somewhere outside the model, and on each new turn you pull back the relevant slice and inject it into the prompt. The agent looks like it remembers because you remembered for it.\n\nThe real engineering question is where you put that memory, and how you find the right piece of it fast.\n\n## The Four-Step Memory Loop\n\nStrip away the framework noise and every agent memory system reduces to the same loop:\n\n- Store each thing worth remembering as a row in a table.\n- Embed the row’s text into a vector that captures its meaning.\n- Search for the rows whose vectors are closest to the current query. Those are your relevant memories.\n- Inject those memories into the next LLM prompt.\n\nSummarization, fact extraction, and decay scoring all sit on top of this loop. Get the loop right first.\n\n## Your Agent Memory Database in One Table\n\nA memory row needs to do three things at once: Hold the text, hold a vector representation of that text, and hold the metadata that tells you whose memory it is and when it was created. TiDB’s [built-in vector search](https://docs.pingcap.com/tidbcloud/vector-search-overview/) lets all three live in a single table:\n\n```\nCREATE TABLE memories (\n  id BIGINT PRIMARY KEY AUTO_RANDOM,\n  user_id VARCHAR(64) NOT NULL,\n  content TEXT NOT NULL,\n  embedding VECTOR(1024),\n  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n  INDEX idx_user (user_id),\n  VECTOR INDEX idx_embedding ((VEC_COSINE_DISTANCE(embedding)))\n);\n```\n\nA few details worth calling out:\n\n`VECTOR(1024)`\n\nis a native column type. No extension, no separate vector store running alongside your database, no sync job between them.- The\n`VECTOR INDEX ... VEC_COSINE_DISTANCE`\n\nline builds an[HNSW vector index](https://docs.pingcap.com/tidbcloud/vector-search-index/), which keeps nearest-neighbor lookups fast as the table grows. `AUTO_RANDOM`\n\ninstead of`AUTO_INCREMENT`\n\nmatters more than people coming from single-node MySQL expect. Sequential integer keys create write hotspots on a distributed system because every insert lands on the same node. Random keys spread inserts across the cluster.\n\n## Storing a Memory\n\nIf you already have an embedding pipeline, pass the vector in:\n\n```\nINSERT INTO memories (user_id, content, embedding)\nVALUES (\n  'user_42',\n  'User prefers window seats on flights longer than 4 hours.',\n  '[0.0123, -0.0456, ..., 0.0789]'\n);\n```\n\nIf you would rather not run your own embedding pipeline, TiDB Cloud can generate vectors for you on insert. Define the embedding column as a generated column that calls a hosted model:\n\n```\nALTER TABLE memories\nMODIFY embedding VECTOR(1024) GENERATED ALWAYS AS (\n  EMBED_TEXT('tidbcloud_free/amazon/titan-embed-text-v2', content)\n) STORED;\n```\n\nAfter that, you insert text and the database produces and stores the vector:\n\n```\nINSERT INTO memories (user_id, content)\nVALUES ('user_42', 'User prefers window seats on long flights.');\n```\n\nOne less moving part to maintain. In my Build session I used OpenAI embeddings; here I switched to the Titan model because TiDB Cloud hosts it for free, so everything in this post runs with no API keys and no credit card. If you prefer OpenAI, Cohere, or Jina embeddings, swap in that model name and bring your own key.\n\n## Recalling Memories with Vector Search\n\nWhen the agent receives a new message, embed the message and ask the database which stored memories mean something similar:\n\n```\nSELECT\n  id,\n  content,\n  VEC_COSINE_DISTANCE(\n    embedding,\n    EMBED_TEXT('tidbcloud_free/amazon/titan-embed-text-v2',\n               'Where should I book his seat?')\n  ) AS distance\nFROM memories\nWHERE user_id = 'user_42'\nORDER BY distance\nLIMIT 5;\n```\n\nThe query about booking a seat returns “User prefers window seats” even though the two sentences share almost no words. That is the embedding doing its job: It matches on meaning, not on exact text.\n\nTwo things make this query nice in TiDB specifically. First, the `WHERE user_id = 'user_42'`\n\nfilter runs in the same query as the vector search. There is no two-system dance, no joining results back together in application code. One round trip. Second, your source-of-truth memory write and your retrieval logic live in the same transactional database, which keeps the application model much simpler than syncing a separate vector system.\n\n## Hybrid Search for When Meaning is Not Enough\n\nPure vector search is strong at concepts and weak at proper nouns. A query like “the issue with order #A-9912” will often surface memories about other orders that feel conceptually close but are not the right record. That is where keyword search earns its place. TiDB has BM25 full-text search built in, so you can blend both signals in a single [hybrid search](https://docs.pingcap.com/tidbcloud/vector-search-hybrid-search/) query:\n\n```\nCREATE FULLTEXT INDEX idx_content ON memories(content);\n\nSELECT\n  id,\n  content,\n  (0.7 * VEC_COSINE_DISTANCE(\n      embedding,\n      EMBED_TEXT('tidbcloud_free/amazon/titan-embed-text-v2', :query))\n   - 0.3 * fts_match_score('idx_content', :query)) AS hybrid_score\nFROM memories\nWHERE user_id = :user_id\nORDER BY hybrid_score\nLIMIT 5;\n```\n\nThe 0.7 and 0.3 are weights you tune for your workload. One gotcha worth flagging: Vector distance is lower-is-better, but BM25 relevance is higher-is-better, which is why the formula subtracts the full-text score instead of adding it. In production RAG, this hybrid setup almost always beats either approach on its own, and you pay for it with one extra index.\n\n## The Agent Memory Loop in Python with pytidb\n\nMost AI developers I work with live in Python, not SQL. The pytidb SDK wraps the same primitives, and with auto-embedding turned on you never touch a vector directly. Insert text. Query with text. The library handles the rest.\n\n``` python\nfrom pytidb import TiDBClient\nfrom pytidb.embeddings import EmbeddingFunction\nfrom pytidb.schema import TableModel, Field\nfrom pytidb.datatype import TEXT\n\ndb = TiDBClient.connect(\n    database_url=\"mysql+pymysql://USER:PASS@HOST:4000/test\"\n    \"?ssl_verify_cert=true&ssl_verify_identity=true\"\n)\n\n# Auto-embedding: the embedding column is derived from `content`\n# server-side.\nembed = EmbeddingFunction(model_name=\"tidbcloud_free/amazon/titan-embed-text-v2\")\n\nclass Memory(TableModel):\n    id: int = Field(primary_key=True)\n    user_id: str\n    content: str = Field(sa_type=TEXT)\n    embedding: list[float] = embed.VectorField(source_field=\"content\")\n\nmemories = db.create_table(schema=Memory, if_exists=\"overwrite\")\n\n# Store. No vector math here; embeddings are generated automatically.\nmemories.bulk_insert([\n    Memory(user_id=\"user_42\",\n           content=\"User prefers window seats on long flights.\"),\n    Memory(user_id=\"user_42\",\n           content=\"User flies United and Delta only.\"),\n])\n\n# Recall. Pass plain text; pytidb embeds the query and ranks by\n# cosine distance.\nresults = (\n    memories.search(\"Where should I book his seat?\")\n    .filter({\"user_id\": \"user_42\"})\n    .limit(5)\n    .to_list()\n)\n```\n\nThree lines to insert a memory, four to retrieve the relevant ones. That is the entire loop.\n\n## Why an Agent Memory Database, Not a Dedicated Vector Store\n\nYou can build this on a dedicated vector database, with Postgres for user profiles, S3 for transcripts, and Redis for session state. Many teams start there. When memory lives in the same engine as the rest of your agent’s state, a few things change:\n\n- Filtering, sorting, and vector search run in one query plan.\n`WHERE user_id = ... AND created_at > ... ORDER BY`\n\nexecutes as a single statement. No application-layer joins between systems. - You get ACID transactions across the whole agent’s state. When the agent writes a new memory, deducts a credit, and logs an event, all three commit together or none of them do. The alternative is debugging partial writes after the fact.\n- You have one copy of the data. Embeddings live next to the source text. When you change embedding models, you re-embed in place. No sync pipeline drifting out of date.\n- You get multi-tenancy at real scale. Each user or agent session can have an isolated branch of the database, created in milliseconds with copy-on-write storage.\n[Manus](https://www.pingcap.com/case-study/manus-agentic-ai-database-tidb/)runs this pattern in production, creating close to 1 million database tenants within three months on its agent platform.\n\n## Agent Memory Database: Where to Go Next\n\nEverything in this post runs on the free tier of [TiDB Cloud Starter](https://www.pingcap.com/tidb/cloud/). You can build the entire memory loop, schema, auto-embedding, vector search, and hybrid retrieval without entering a credit card. The same primitives scale up to production workloads: Pinterest runs 1.5 PB of data at a peak of 8 million QPS on TiDB, and Flipkart benchmarked TiDB as a hot store to 1 million QPS. For metadata-heavy consolidation, Atlassian collapsed 750 PostgreSQL clusters down to 16 on TiDB.\n\nIf you would rather not hand-roll the memory layer, the open-source [mem9](https://mem9.ai/) project sits on top of these same TiDB primitives and provides a memory API with fact extraction, deduping, and decay built in. Same storage underneath, with the memory semantics handled for you.\n\nThe full code from this post, including the Python version using the pytidb SDK, is in the [pingcap/agent-rules](https://github.com/pingcap/agent-rules) repository. Clone it, point it at a free TiDB Cloud Starter cluster, and you have a working agent memory loop in a few minutes.\n\nExperience modern data infrastructure firsthand.\n\n## TiDB Cloud Dedicated\n\nA fully-managed cloud DBaaS for predictable workloads\n\n## TiDB Cloud Starter\n\nA fully-managed cloud DBaaS for auto-scaling workloads", "url": "https://wpnews.pro/news/build-persistent-scalable-ai-agent-memory-with-tidb", "canonical_source": "https://www.pingcap.com/blog/agent-memory-database-tidb/", "published_at": "2026-06-10 15:53:53+00:00", "updated_at": "2026-06-12 09:04:23.060439+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "large-language-models", "ai-tools"], "entities": ["TiDB", "TiDB Cloud", "pingcap/agent-rules", "Microsoft Build"], "alternates": {"html": "https://wpnews.pro/news/build-persistent-scalable-ai-agent-memory-with-tidb", "markdown": "https://wpnews.pro/news/build-persistent-scalable-ai-agent-memory-with-tidb.md", "text": "https://wpnews.pro/news/build-persistent-scalable-ai-agent-memory-with-tidb.txt", "jsonld": "https://wpnews.pro/news/build-persistent-scalable-ai-agent-memory-with-tidb.jsonld"}}