cd /news/large-language-models/rag-with-openai-embeddings-pgvector-… · home topics large-language-models article
[ARTICLE · art-19663] src=dev.to pub= topic=large-language-models verified=true sentiment=· neutral

RAG with OpenAI Embeddings, pgvector and LangChain

A developer demonstrated a complete Retrieval-Augmented Generation (RAG) pipeline using OpenAI embeddings, PostgreSQL with pgvector, and LangChain for document chunking. The implementation stores knowledge as 1536-dimensional vector embeddings, retrieves the most semantically relevant chunks via cosine distance search, and generates grounded answers from the retrieved context. The guide provides working code for embedding queries and documents, creating vector database tables, and implementing threshold-based filtering to discard weak matches.

read3 min publishedJun 2, 2026

Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.

This guide shows a simple end-to-end flow with OpenAI embeddings, PostgreSQL + pgvector, and LangChain chunking.

openai

, langchain

, pg

, pgvector

Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.

In practice:

import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Use a single string when embedding a user query.

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'How do I connect pgvector to PostgreSQL?',
});

const queryEmbedding = response.data[0].embedding;
console.log(queryEmbedding.length);

Use an array to embed multiple chunks in one API call.

const chunks = [
  'pgvector adds vector similarity search to PostgreSQL.',
  'LangChain helps split long documents into retrieval-friendly chunks.',
  'RAG retrieves context first, then asks an LLM to answer.',
];

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: chunks,
});

const rows = response.data.map((item, index) => ({
  text: chunks[index],
  embedding: item.embedding,
}));

console.log(rows.length); // 3

Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts.

Start with chunkSize: 800

and chunkOverlap: 120

, then adjust based on your document style and answer quality.

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 800,
  chunkOverlap: 120,
});

const docs = await splitter.createDocuments([
  `RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.`,
]);

console.log(docs.map((doc) => doc.pageContent));

Create a table with a vector column. text-embedding-3-small

outputs 1536 dimensions.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS rag_chunks (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding VECTOR(1536) NOT NULL,
  source TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Insert chunk vectors from Node.js:

import pg from 'pg';
import pgvector from 'pgvector/pg';

const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
await pgvector.registerTypes(pool);

await pool.query(
  `INSERT INTO rag_chunks (content, embedding, source)
   VALUES ($1, $2, $3)`,
  ['Chunk content', pgvector.toSql(queryEmbedding), 'notes.md']
);

Embed the user question, then retrieve nearest chunks using cosine distance.

Lower distance

means a closer semantic match.

top-k

means how many nearest chunks you return (in this query, k=4

with LIMIT 4

).

You can also use a simple threshold (for example 0.4

) to discard weak matches.

As a starting point, many setups work well in the 0.35

to 0.45

range for cosine distance, then tune with real questions from your domain.

const searchResult = await pool.query(
  `SELECT id, content, source, embedding <=> $1::vector AS distance
   FROM rag_chunks
   ORDER BY embedding <=> $1::vector
   LIMIT 4`,
  [pgvector.toSql(queryEmbedding)]
);

const contextChunks = searchResult.rows.map((row) => row.content);

Threshold filtering example:

const DISTANCE_THRESHOLD = 0.4;
const filteredChunks = searchResult.rows
  .filter((row) => Number(row.distance) <= DISTANCE_THRESHOLD)
  .map((row) => row.content);

If no chunks pass the threshold, skip answer generation and return a fallback message:

if (filteredChunks.length === 0) {
  console.log('I do not have enough context to answer this.');
  process.exit(0);
}

Use retrieved chunks as grounded context for the final model call.

const context = contextChunks.join('\n\n---\n\n');

const answer = await client.responses.create({
  model: 'gpt-5.5',
  instructions:
    'Answer only from the provided context. If context is insufficient, respond with: I do not have enough context to answer this.',
  input: `Context:\n${context}\n\nQuestion: How does pgvector semantic search work?`,
});

console.log(answer.output_text);

Runnable scripts for this post live in the rag-openai-embeddings-pgvector-demo

folder in the private demos repository. Get access via code demos.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/rag-with-openai-embe…] indexed:0 read:3min 2026-06-02 ·