Give Your AI Agent Persistent Long-Term Memory with Postgres and pgvector

A tutorial shows developers how to give AI agents persistent long-term memory using Postgres with the pgvector extension and OpenAI embeddings, eliminating the need for external vector databases. The approach stores conversation history semantically and retrieves relevant past exchanges across sessions via HNSW indexing.

Give Your AI Agent Persistent Long-Term Memory with Postgres and pgvector Store and retrieve conversation history semantically across sessions using pgvector's HNSW index and OpenAI embeddings — no external vector database required. Mariana Souza https://sourcefeed.dev/u/mariana souza What You'll Build A Python agent that embeds each conversation exchange and persists it in Postgres via pgvector. On every new message, it retrieves the most semantically similar past exchanges and injects them into the system prompt, giving the agent cross-session recall that survives process restarts. Prerequisites - Python 3.10+ - Docker used for the pgvector/pgvector image - OpenAI API key with access to text-embedding-3-small and gpt-4o-mini - pgvector 0.6.0+ bundled in the Docker image below Install Python dependencies: pip install openai psycopg2-binary pgvector numpy python-dotenv If you're running a self-managed Postgres instance instead of Docker, follow the build-from-source steps at github.com/pgvector/pgvector. Step 1: Start Postgres with pgvector The official image ships with the extension already compiled against the correct Postgres version: docker run -d \ --name agent-memory \ -e POSTGRES USER=agent \ -e POSTGRES PASSWORD=changeme \ -e POSTGRES DB=agentdb \ -p 5432:5432 \ pgvector/pgvector:pg16 Verify the extension is present: docker exec -it agent-memory psql -U agent -d agentdb \ -c "SELECT extversion FROM pg extension WHERE extname = 'vector';" If no row comes back, connect via psql and run CREATE EXTENSION vector; manually. That's a one-time operation per database. Step 2: Create the Schema Save this as schema.sql : CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE IF NOT EXISTS agent memories id BIGSERIAL PRIMARY KEY, session id TEXT NOT NULL, content TEXT NOT NULL, embedding vector 1536 , created at TIMESTAMPTZ DEFAULT NOW ; CREATE INDEX IF NOT EXISTS memories hnsw idx ON agent memories USING hnsw embedding vector cosine ops ; Apply it: docker exec -i agent-memory psql -U agent -d agentdb < schema.sql vector 1536 matches text-embedding-3-small 's output dimensionality. The HNSW index is the right choice here: unlike IVFFlat, it requires no training phase and delivers better recall. The tradeoff is slightly higher memory use, which only matters at millions of rows. vector cosine ops tells Postgres to optimize the index for <= cosine distance queries, which is what you want for OpenAI embeddings. Step 3: Build the Memory Store Create .env : OPENAI API KEY=sk-... PG HOST=localhost PG DB=agentdb PG USER=agent PG PASSWORD=changeme Then memory store.py . Note the load dotenv call at the top of this file. Because client = OpenAI runs at import time module level , the environment variables must be populated before the module is imported anywhere. Calling load dotenv here makes the module self-sufficient regardless of call order in the importer. python import os import contextlib import numpy as np import psycopg2 from pgvector.psycopg2 import register vector from openai import OpenAI from dotenv import load dotenv load dotenv client = OpenAI EMBED MODEL = "text-embedding-3-small" @contextlib.contextmanager def db : conn = psycopg2.connect host=os.getenv "PG HOST", "localhost" , dbname=os.getenv "PG DB", "agentdb" , user=os.getenv "PG USER", "agent" , password=os.environ "PG PASSWORD" , register vector conn try: yield conn conn.commit finally: conn.close def embed text: str - np.ndarray: raw = client.embeddings.create model=EMBED MODEL, input=text .data 0 .embedding return np.array raw, dtype=np.float32 def save memory session id: str, content: str - None: vec = embed content with db as conn, conn.cursor as cur: cur.execute "INSERT INTO agent memories session id, content, embedding " "VALUES %s, %s, %s ", session id, content, vec , def recall query: str, top k: int = 4 - list str : vec = embed query with db as conn, conn.cursor as cur: cur.execute """ SELECT content FROM agent memories ORDER BY embedding <= %s LIMIT %s """, vec, top k , return row 0 for row in cur.fetchall register vector conn teaches psycopg2 how to serialize numpy arrays into pgvector's text wire format and deserialize results back. It patches adapters on the connection object, so each new connection needs the call. pgvector stores vectors as 32-bit floats internally, so np.float32 matches the storage precision exactly. recall queries across all sessions by default. Add WHERE session id = %s if you have multiple users who need strict memory isolation. Step 4: The Agent Loop agent.py . The import order here is intentional: load dotenv must fire before memory store is imported. Even though memory store.py now also calls load dotenv , making it self-sufficient, keeping this order in agent.py is defensive and costs nothing. python from dotenv import load dotenv load dotenv must run before memory store is imported import uuid import os from openai import OpenAI from memory store import save memory, recall client = OpenAI SESSION ID = str uuid.uuid4 def build system prompt memories: list str - str: if not memories: return "You are a helpful assistant." block = "\n".join f"- {m}" for m in memories return "You are a helpful assistant.\n" "Relevant memories from past conversations:\n" f"{block}\n\n" "Draw on these only when they're relevant." def chat user message: str - str: memories = recall user message, top k=4 response = client.chat.completions.create model="gpt-4o-mini", messages= {"role": "system", "content": build system prompt memories }, {"role": "user", "content": user message}, , reply = response.choices 0 .message.content save memory SESSION ID, f"User: {user message}\nAssistant: {reply}" return reply if name == " main ": print f"Session: {SESSION ID}\nType 'quit' to exit.\n" while True: msg = input "You: " .strip if not msg or msg.lower in "quit", "exit" : break print f"Agent: {chat msg }\n" Each turn follows the same sequence: embed the query, pull the top-4 semantically similar memories from Postgres, inject them into the system prompt, generate a response, then store the full exchange as a new memory. Storing the exchange as a single string User: ... / Assistant: ... works well for recall because a single embedding captures both sides of the turn. An alternative: store only LLM-synthesized summaries per turn, cutting embedding calls in half and reducing noise in retrieval. Verify It Works Run a first session and tell the agent something specific: You: My name is Priya and I'm building a Rust compiler for embedded targets. Agent: That's a fascinating project, Priya... Exit with quit , then restart the script entirely: python agent.py You: Do you remember what project I'm working on? Agent: Yes, you mentioned you're building a Rust compiler for embedded targets. That recall came from Postgres, not any in-process state. To inspect what's stored: docker exec -it agent-memory psql -U agent -d agentdb \ -c "SELECT session id, left content, 80 , created at FROM agent memories ORDER BY created at DESC LIMIT 5;" Troubleshooting OpenAIError: The api key client option must be set - Your .env file isn't being found or OPENAI API KEY is missing from it. Confirm the file is in the same directory you're running the script from. Both memory store.py and agent.py call load dotenv , so one of them should pick it up, but python-dotenv won't override a variable that's already set to an empty string in the shell environment. ERROR: type "vector" does not exist - The extension isn't enabled in this database. Run CREATE EXTENSION vector; inside psql. One-time per database, not per connection. connection refused on port 5432 - The container stopped. Check with docker ps -a , then docker start agent-memory . ERROR: expected 1536 dimensions, not N - You mixed embedding models or changed the vector N column after inserting rows. Drop the table, re-apply schema.sql , and commit to one model for the life of the table. Retrieved memories are irrelevant - With few rows, cosine similarity has little to work with and can surface weak matches. Add a distance threshold: modify the SELECT to WHERE embedding <= %s < 0.4 and tune from there. You can also filter by created at to prefer recent memories. Next Steps Memory compression: Periodically summarize older memories with the LLM and replace raw exchanges with a condensed form. Fewer rows, better signal-to-noise. Importance scoring: Add a relevance score FLOAT column. Rank retrieved memories by a blend of semantic similarity and recency rather than cosine distance alone. Async: Swap psycopg2 for asyncpg and the pgvector package's asyncpg adapter for use inside async agent frameworks like LangGraph or Pydantic AI. Structured extraction: Before embedding, extract named entities and key facts from the exchange. Embedding structured facts rather than raw conversation text sharpens retrieval precision considerably. Mariana Souza https://sourcefeed.dev/u/mariana souza · Senior Editor Mariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon. Discussion 0 No comments yet Be the first to weigh in.