{"slug": "give-your-ai-agent-persistent-long-term-memory-with-postgres-and-pgvector", "title": "Give Your AI Agent Persistent Long-Term Memory with Postgres and pgvector", "summary": "A tutorial shows developers how to give AI agents persistent long-term memory using Postgres with the pgvector extension and OpenAI embeddings, eliminating the need for external vector databases. The approach stores conversation history semantically and retrieves relevant past exchanges across sessions via HNSW indexing.", "body_md": "# Give Your AI Agent Persistent Long-Term Memory with Postgres and pgvector\n\nStore and retrieve conversation history semantically across sessions using pgvector's HNSW index and OpenAI embeddings — no external vector database required.\n\n[Mariana Souza](https://sourcefeed.dev/u/mariana_souza)\n\n## What You'll Build\n\nA Python agent that embeds each conversation exchange and persists it in Postgres via pgvector. On every new message, it retrieves the most semantically similar past exchanges and injects them into the system prompt, giving the agent cross-session recall that survives process restarts.\n\n## Prerequisites\n\n- Python 3.10+\n- Docker (used for the\n`pgvector/pgvector`\n\nimage) - OpenAI API key with access to\n`text-embedding-3-small`\n\nand`gpt-4o-mini`\n\n- pgvector 0.6.0+ (bundled in the Docker image below)\n\nInstall Python dependencies:\n\n```\npip install openai psycopg2-binary pgvector numpy python-dotenv\n```\n\nIf you're running a self-managed Postgres instance instead of Docker, follow the build-from-source steps at github.com/pgvector/pgvector.\n\n## Step 1: Start Postgres with pgvector\n\nThe official image ships with the extension already compiled against the correct Postgres version:\n\n```\ndocker run -d \\\n  --name agent-memory \\\n  -e POSTGRES_USER=agent \\\n  -e POSTGRES_PASSWORD=changeme \\\n  -e POSTGRES_DB=agentdb \\\n  -p 5432:5432 \\\n  pgvector/pgvector:pg16\n```\n\nVerify the extension is present:\n\n```\ndocker exec -it agent-memory psql -U agent -d agentdb \\\n  -c \"SELECT extversion FROM pg_extension WHERE extname = 'vector';\"\n```\n\nIf no row comes back, connect via psql and run `CREATE EXTENSION vector;`\n\nmanually. That's a one-time operation per database.\n\n## Step 2: Create the Schema\n\nSave this as `schema.sql`\n\n:\n\n```\nCREATE EXTENSION IF NOT EXISTS vector;\n\nCREATE TABLE IF NOT EXISTS agent_memories (\n    id          BIGSERIAL PRIMARY KEY,\n    session_id  TEXT        NOT NULL,\n    content     TEXT        NOT NULL,\n    embedding   vector(1536),\n    created_at  TIMESTAMPTZ DEFAULT NOW()\n);\n\nCREATE INDEX IF NOT EXISTS memories_hnsw_idx\n    ON agent_memories\n    USING hnsw (embedding vector_cosine_ops);\n```\n\nApply it:\n\n```\ndocker exec -i agent-memory psql -U agent -d agentdb < schema.sql\n```\n\n`vector(1536)`\n\nmatches `text-embedding-3-small`\n\n's output dimensionality. The HNSW index is the right choice here: unlike IVFFlat, it requires no training phase and delivers better recall. The tradeoff is slightly higher memory use, which only matters at millions of rows. `vector_cosine_ops`\n\ntells Postgres to optimize the index for `<=>`\n\n(cosine distance) queries, which is what you want for OpenAI embeddings.\n\n## Step 3: Build the Memory Store\n\nCreate `.env`\n\n:\n\n```\nOPENAI_API_KEY=sk-...\nPG_HOST=localhost\nPG_DB=agentdb\nPG_USER=agent\nPG_PASSWORD=changeme\n```\n\nThen `memory_store.py`\n\n. Note the `load_dotenv()`\n\ncall at the top of this file. Because `client = OpenAI()`\n\nruns at import time (module level), the environment variables must be populated before the module is imported anywhere. Calling `load_dotenv()`\n\nhere makes the module self-sufficient regardless of call order in the importer.\n\n``` python\nimport os\nimport contextlib\nimport numpy as np\nimport psycopg2\nfrom pgvector.psycopg2 import register_vector\nfrom openai import OpenAI\nfrom dotenv import load_dotenv\n\nload_dotenv()\n\nclient = OpenAI()\nEMBED_MODEL = \"text-embedding-3-small\"\n\n@contextlib.contextmanager\ndef _db():\n    conn = psycopg2.connect(\n        host=os.getenv(\"PG_HOST\", \"localhost\"),\n        dbname=os.getenv(\"PG_DB\", \"agentdb\"),\n        user=os.getenv(\"PG_USER\", \"agent\"),\n        password=os.environ[\"PG_PASSWORD\"],\n    )\n    register_vector(conn)\n    try:\n        yield conn\n        conn.commit()\n    finally:\n        conn.close()\n\ndef embed(text: str) -> np.ndarray:\n    raw = client.embeddings.create(model=EMBED_MODEL, input=text).data[0].embedding\n    return np.array(raw, dtype=np.float32)\n\ndef save_memory(session_id: str, content: str) -> None:\n    vec = embed(content)\n    with _db() as conn, conn.cursor() as cur:\n        cur.execute(\n            \"INSERT INTO agent_memories (session_id, content, embedding) \"\n            \"VALUES (%s, %s, %s)\",\n            (session_id, content, vec),\n        )\n\ndef recall(query: str, top_k: int = 4) -> list[str]:\n    vec = embed(query)\n    with _db() as conn, conn.cursor() as cur:\n        cur.execute(\n            \"\"\"\n            SELECT content\n            FROM agent_memories\n            ORDER BY embedding <=> %s\n            LIMIT %s\n            \"\"\",\n            (vec, top_k),\n        )\n        return [row[0] for row in cur.fetchall()]\n```\n\n`register_vector(conn)`\n\nteaches psycopg2 how to serialize numpy arrays into pgvector's text wire format and deserialize results back. It patches adapters on the connection object, so each new connection needs the call. pgvector stores vectors as 32-bit floats internally, so `np.float32`\n\nmatches the storage precision exactly.\n\n`recall`\n\nqueries across all sessions by default. Add `WHERE session_id = %s`\n\nif you have multiple users who need strict memory isolation.\n\n## Step 4: The Agent Loop\n\n`agent.py`\n\n. The import order here is intentional: `load_dotenv()`\n\nmust fire before `memory_store`\n\nis imported. Even though `memory_store.py`\n\nnow also calls `load_dotenv()`\n\n, making it self-sufficient, keeping this order in `agent.py`\n\nis defensive and costs nothing.\n\n``` python\nfrom dotenv import load_dotenv\nload_dotenv()  # must run before memory_store is imported\n\nimport uuid\nimport os\nfrom openai import OpenAI\nfrom memory_store import save_memory, recall\n\nclient = OpenAI()\nSESSION_ID = str(uuid.uuid4())\n\ndef build_system_prompt(memories: list[str]) -> str:\n    if not memories:\n        return \"You are a helpful assistant.\"\n    block = \"\\n\".join(f\"- {m}\" for m in memories)\n    return (\n        \"You are a helpful assistant.\\n\"\n        \"Relevant memories from past conversations:\\n\"\n        f\"{block}\\n\\n\"\n        \"Draw on these only when they're relevant.\"\n    )\n\ndef chat(user_message: str) -> str:\n    memories = recall(user_message, top_k=4)\n    response = client.chat.completions.create(\n        model=\"gpt-4o-mini\",\n        messages=[\n            {\"role\": \"system\", \"content\": build_system_prompt(memories)},\n            {\"role\": \"user\", \"content\": user_message},\n        ],\n    )\n    reply = response.choices[0].message.content\n    save_memory(SESSION_ID, f\"User: {user_message}\\nAssistant: {reply}\")\n    return reply\n\nif __name__ == \"__main__\":\n    print(f\"Session: {SESSION_ID}\\nType 'quit' to exit.\\n\")\n    while True:\n        msg = input(\"You: \").strip()\n        if not msg or msg.lower() in (\"quit\", \"exit\"):\n            break\n        print(f\"Agent: {chat(msg)}\\n\")\n```\n\nEach turn follows the same sequence: embed the query, pull the top-4 semantically similar memories from Postgres, inject them into the system prompt, generate a response, then store the full exchange as a new memory. Storing the exchange as a single string (`User: ... / Assistant: ...`\n\n) works well for recall because a single embedding captures both sides of the turn. An alternative: store only LLM-synthesized summaries per turn, cutting embedding calls in half and reducing noise in retrieval.\n\n## Verify It Works\n\nRun a first session and tell the agent something specific:\n\n```\nYou: My name is Priya and I'm building a Rust compiler for embedded targets.\nAgent: That's a fascinating project, Priya...\n```\n\nExit with `quit`\n\n, then restart the script entirely:\n\n```\npython agent.py\nYou: Do you remember what project I'm working on?\nAgent: Yes, you mentioned you're building a Rust compiler for embedded targets.\n```\n\nThat recall came from Postgres, not any in-process state. To inspect what's stored:\n\n```\ndocker exec -it agent-memory psql -U agent -d agentdb \\\n  -c \"SELECT session_id, left(content, 80), created_at FROM agent_memories ORDER BY created_at DESC LIMIT 5;\"\n```\n\n## Troubleshooting\n\n** OpenAIError: The api_key client option must be set** - Your\n\n`.env`\n\nfile isn't being found or `OPENAI_API_KEY`\n\nis missing from it. Confirm the file is in the same directory you're running the script from. Both `memory_store.py`\n\nand `agent.py`\n\ncall `load_dotenv()`\n\n, so one of them should pick it up, but python-dotenv won't override a variable that's already set to an empty string in the shell environment.** ERROR: type \"vector\" does not exist** - The extension isn't enabled in this database. Run\n\n`CREATE EXTENSION vector;`\n\ninside psql. One-time per database, not per connection.** connection refused on port 5432** - The container stopped. Check with\n\n`docker ps -a`\n\n, then `docker start agent-memory`\n\n.** ERROR: expected 1536 dimensions, not N** - You mixed embedding models or changed the\n\n`vector(N)`\n\ncolumn after inserting rows. Drop the table, re-apply `schema.sql`\n\n, and commit to one model for the life of the table.**Retrieved memories are irrelevant** - With few rows, cosine similarity has little to work with and can surface weak matches. Add a distance threshold: modify the SELECT to `WHERE (embedding <=> %s) < 0.4`\n\nand tune from there. You can also filter by `created_at`\n\nto prefer recent memories.\n\n## Next Steps\n\n**Memory compression:** Periodically summarize older memories with the LLM and replace raw exchanges with a condensed form. Fewer rows, better signal-to-noise.**Importance scoring:** Add a`relevance_score FLOAT`\n\ncolumn. Rank retrieved memories by a blend of semantic similarity and recency rather than cosine distance alone.**Async:** Swap psycopg2 for asyncpg and the`pgvector`\n\npackage's asyncpg adapter for use inside async agent frameworks like LangGraph or Pydantic AI.**Structured extraction:** Before embedding, extract named entities and key facts from the exchange. Embedding structured facts rather than raw conversation text sharpens retrieval precision considerably.\n\n[Mariana Souza](https://sourcefeed.dev/u/mariana_souza)· Senior Editor\n\nMariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon.\n\n## Discussion 0\n\nNo comments yet\n\nBe the first to weigh in.", "url": "https://wpnews.pro/news/give-your-ai-agent-persistent-long-term-memory-with-postgres-and-pgvector", "canonical_source": "https://sourcefeed.dev/a/give-your-ai-agent-persistent-long-term-memory-with-postgres-and-pgvector", "published_at": "2026-07-04 07:44:23+00:00", "updated_at": "2026-07-04 07:52:54.088667+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "ai-tools", "large-language-models", "machine-learning"], "entities": ["Postgres", "pgvector", "OpenAI", "Docker", "Python", "HNSW", "text-embedding-3-small", "gpt-4o-mini"], "alternates": {"html": "https://wpnews.pro/news/give-your-ai-agent-persistent-long-term-memory-with-postgres-and-pgvector", "markdown": "https://wpnews.pro/news/give-your-ai-agent-persistent-long-term-memory-with-postgres-and-pgvector.md", "text": "https://wpnews.pro/news/give-your-ai-agent-persistent-long-term-memory-with-postgres-and-pgvector.txt", "jsonld": "https://wpnews.pro/news/give-your-ai-agent-persistent-long-term-memory-with-postgres-and-pgvector.jsonld"}}