{"slug": "rag-with-openai-embeddings-pgvector-and-langchain", "title": "RAG with OpenAI Embeddings, pgvector and LangChain", "summary": "A developer demonstrated a complete Retrieval-Augmented Generation (RAG) pipeline using OpenAI embeddings, PostgreSQL with pgvector, and LangChain for document chunking. The implementation stores knowledge as 1536-dimensional vector embeddings, retrieves the most semantically relevant chunks via cosine distance search, and generates grounded answers from the retrieved context. The guide provides working code for embedding queries and documents, creating vector database tables, and implementing threshold-based filtering to discard weak matches.", "body_md": "Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.\n\nThis guide shows a simple end-to-end flow with OpenAI embeddings, PostgreSQL + pgvector, and LangChain chunking.\n\n`openai`\n\n, `langchain`\n\n, `pg`\n\n, `pgvector`\n\nEmbeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.\n\nIn practice:\n\n``` python\nimport OpenAI from 'openai';\n\nconst client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });\n```\n\nUse a single string when embedding a user query.\n\n``` js\nconst response = await client.embeddings.create({\n  model: 'text-embedding-3-small',\n  input: 'How do I connect pgvector to PostgreSQL?',\n});\n\nconst queryEmbedding = response.data[0].embedding;\nconsole.log(queryEmbedding.length);\n```\n\nUse an array to embed multiple chunks in one API call.\n\n``` js\nconst chunks = [\n  'pgvector adds vector similarity search to PostgreSQL.',\n  'LangChain helps split long documents into retrieval-friendly chunks.',\n  'RAG retrieves context first, then asks an LLM to answer.',\n];\n\nconst response = await client.embeddings.create({\n  model: 'text-embedding-3-small',\n  input: chunks,\n});\n\nconst rows = response.data.map((item, index) => ({\n  text: chunks[index],\n  embedding: item.embedding,\n}));\n\nconsole.log(rows.length); // 3\n```\n\nChunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts.\n\nStart with `chunkSize: 800`\n\nand `chunkOverlap: 120`\n\n, then adjust based on your document style and answer quality.\n\n``` js\nimport { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';\n\nconst splitter = new RecursiveCharacterTextSplitter({\n  chunkSize: 800,\n  chunkOverlap: 120,\n});\n\nconst docs = await splitter.createDocuments([\n  `RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.`,\n]);\n\nconsole.log(docs.map((doc) => doc.pageContent));\n```\n\nCreate a table with a vector column. `text-embedding-3-small`\n\noutputs 1536 dimensions.\n\n```\nCREATE EXTENSION IF NOT EXISTS vector;\n\nCREATE TABLE IF NOT EXISTS rag_chunks (\n  id BIGSERIAL PRIMARY KEY,\n  content TEXT NOT NULL,\n  embedding VECTOR(1536) NOT NULL,\n  source TEXT,\n  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()\n);\n```\n\nInsert chunk vectors from Node.js:\n\n``` python\nimport pg from 'pg';\nimport pgvector from 'pgvector/pg';\n\nconst pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });\nawait pgvector.registerTypes(pool);\n\nawait pool.query(\n  `INSERT INTO rag_chunks (content, embedding, source)\n   VALUES ($1, $2, $3)`,\n  ['Chunk content', pgvector.toSql(queryEmbedding), 'notes.md']\n);\n```\n\nEmbed the user question, then retrieve nearest chunks using cosine distance.\n\nLower `distance`\n\nmeans a closer semantic match.\n\n`top-k`\n\nmeans how many nearest chunks you return (in this query, `k=4`\n\nwith `LIMIT 4`\n\n).\n\nYou can also use a simple threshold (for example `0.4`\n\n) to discard weak matches.\n\nAs a starting point, many setups work well in the `0.35`\n\nto `0.45`\n\nrange for cosine distance, then tune with real questions from your domain.\n\n``` js\nconst searchResult = await pool.query(\n  `SELECT id, content, source, embedding <=> $1::vector AS distance\n   FROM rag_chunks\n   ORDER BY embedding <=> $1::vector\n   LIMIT 4`,\n  [pgvector.toSql(queryEmbedding)]\n);\n\nconst contextChunks = searchResult.rows.map((row) => row.content);\n```\n\nThreshold filtering example:\n\n``` js\nconst DISTANCE_THRESHOLD = 0.4;\nconst filteredChunks = searchResult.rows\n  .filter((row) => Number(row.distance) <= DISTANCE_THRESHOLD)\n  .map((row) => row.content);\n```\n\nIf no chunks pass the threshold, skip answer generation and return a fallback message:\n\n```\nif (filteredChunks.length === 0) {\n  console.log('I do not have enough context to answer this.');\n  process.exit(0);\n}\n```\n\nUse retrieved chunks as grounded context for the final model call.\n\n``` js\nconst context = contextChunks.join('\\n\\n---\\n\\n');\n\nconst answer = await client.responses.create({\n  model: 'gpt-5.5',\n  instructions:\n    'Answer only from the provided context. If context is insufficient, respond with: I do not have enough context to answer this.',\n  input: `Context:\\n${context}\\n\\nQuestion: How does pgvector semantic search work?`,\n});\n\nconsole.log(answer.output_text);\n```\n\nRunnable scripts for this post live in the `rag-openai-embeddings-pgvector-demo`\n\nfolder in the private demos repository. Get access via [code demos](https://sevic.dev/demos).", "url": "https://wpnews.pro/news/rag-with-openai-embeddings-pgvector-and-langchain", "canonical_source": "https://dev.to/zsevic/rag-with-openai-embeddings-pgvector-and-langchain-2m0g", "published_at": "2026-06-02 22:25:21+00:00", "updated_at": "2026-06-02 22:42:59.237840+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "generative-ai", "ai-tools", "ai-infrastructure"], "entities": ["OpenAI", "pgvector", "PostgreSQL", "LangChain"], "alternates": {"html": "https://wpnews.pro/news/rag-with-openai-embeddings-pgvector-and-langchain", "markdown": "https://wpnews.pro/news/rag-with-openai-embeddings-pgvector-and-langchain.md", "text": "https://wpnews.pro/news/rag-with-openai-embeddings-pgvector-and-langchain.txt", "jsonld": "https://wpnews.pro/news/rag-with-openai-embeddings-pgvector-and-langchain.jsonld"}}