RAG with OpenAI Embeddings, pgvector and LangChain

A developer demonstrated a complete Retrieval-Augmented Generation (RAG) pipeline using OpenAI embeddings, PostgreSQL with pgvector, and LangChain for document chunking. The implementation stores knowledge as 1536-dimensional vector embeddings, retrieves the most semantically relevant chunks via cosine distance search, and generates grounded answers from the retrieved context. The guide provides working code for embedding queries and documents, creating vector database tables, and implementing threshold-based filtering to discard weak matches.

Retrieval-Augmented Generation RAG is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context. This guide shows a simple end-to-end flow with OpenAI embeddings, PostgreSQL + pgvector, and LangChain chunking. openai , langchain , pg , pgvector Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space. In practice: python import OpenAI from 'openai'; const client = new OpenAI { apiKey: process.env.OPENAI API KEY } ; Use a single string when embedding a user query. js const response = await client.embeddings.create { model: 'text-embedding-3-small', input: 'How do I connect pgvector to PostgreSQL?', } ; const queryEmbedding = response.data 0 .embedding; console.log queryEmbedding.length ; Use an array to embed multiple chunks in one API call. js const chunks = 'pgvector adds vector similarity search to PostgreSQL.', 'LangChain helps split long documents into retrieval-friendly chunks.', 'RAG retrieves context first, then asks an LLM to answer.', ; const response = await client.embeddings.create { model: 'text-embedding-3-small', input: chunks, } ; const rows = response.data.map item, index = { text: chunks index , embedding: item.embedding, } ; console.log rows.length ; // 3 Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts. Start with chunkSize: 800 and chunkOverlap: 120 , then adjust based on your document style and answer quality. js import { RecursiveCharacterTextSplitter } from 'langchain/text splitter'; const splitter = new RecursiveCharacterTextSplitter { chunkSize: 800, chunkOverlap: 120, } ; const docs = await splitter.createDocuments RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time. , ; console.log docs.map doc = doc.pageContent ; Create a table with a vector column. text-embedding-3-small outputs 1536 dimensions. CREATE EXTENSION IF NOT EXISTS vector; CREATE TABLE IF NOT EXISTS rag chunks id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, embedding VECTOR 1536 NOT NULL, source TEXT, created at TIMESTAMPTZ NOT NULL DEFAULT NOW ; Insert chunk vectors from Node.js: python import pg from 'pg'; import pgvector from 'pgvector/pg'; const pool = new pg.Pool { connectionString: process.env.DATABASE URL } ; await pgvector.registerTypes pool ; await pool.query INSERT INTO rag chunks content, embedding, source VALUES $1, $2, $3 , 'Chunk content', pgvector.toSql queryEmbedding , 'notes.md' ; Embed the user question, then retrieve nearest chunks using cosine distance. Lower distance means a closer semantic match. top-k means how many nearest chunks you return in this query, k=4 with LIMIT 4 . You can also use a simple threshold for example 0.4 to discard weak matches. As a starting point, many setups work well in the 0.35 to 0.45 range for cosine distance, then tune with real questions from your domain. js const searchResult = await pool.query SELECT id, content, source, embedding <= $1::vector AS distance FROM rag chunks ORDER BY embedding <= $1::vector LIMIT 4 , pgvector.toSql queryEmbedding ; const contextChunks = searchResult.rows.map row = row.content ; Threshold filtering example: js const DISTANCE THRESHOLD = 0.4; const filteredChunks = searchResult.rows .filter row = Number row.distance <= DISTANCE THRESHOLD .map row = row.content ; If no chunks pass the threshold, skip answer generation and return a fallback message: if filteredChunks.length === 0 { console.log 'I do not have enough context to answer this.' ; process.exit 0 ; } Use retrieved chunks as grounded context for the final model call. js const context = contextChunks.join '\n\n---\n\n' ; const answer = await client.responses.create { model: 'gpt-5.5', instructions: 'Answer only from the provided context. If context is insufficient, respond with: I do not have enough context to answer this.', input: Context:\n${context}\n\nQuestion: How does pgvector semantic search work? , } ; console.log answer.output text ; Runnable scripts for this post live in the rag-openai-embeddings-pgvector-demo folder in the private demos repository. Get access via code demos https://sevic.dev/demos .