RAG with OpenAI Embeddings, pgvector and LangChain

wpnews.pro

cd /news/large-language-models/rag-with-openai-embeddings-pgvector-… · home › topics › large-language-models › article

[ARTICLE · art-19663] src=dev.to ↗ pub=2026-06-02T22:25Z topic=large-language-models verified=true sentiment=· neutral

RAG with OpenAI Embeddings, pgvector and LangChain

A developer demonstrated a complete Retrieval-Augmented Generation (RAG) pipeline using OpenAI embeddings, PostgreSQL with pgvector, and LangChain for document chunking. The implementation stores knowledge as 1536-dimensional vector embeddings, retrieves the most semantically relevant chunks via cosine distance search, and generates grounded answers from the retrieved context. The guide provides working code for embedding queries and documents, creating vector database tables, and implementing threshold-based filtering to discard weak matches.

read3 min views18 publishedJun 2, 2026

Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.

This guide shows a simple end-to-end flow with OpenAI embeddings, PostgreSQL + pgvector, and LangChain chunking.

openai

, langchain

, pg

, pgvector

Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.

In practice:

import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

Use a single string when embedding a user query.

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'How do I connect pgvector to PostgreSQL?',
});

const queryEmbedding = response.data[0].embedding;
console.log(queryEmbedding.length);

Use an array to embed multiple chunks in one API call.

const chunks = [
  'pgvector adds vector similarity search to PostgreSQL.',
  'LangChain helps split long documents into retrieval-friendly chunks.',
  'RAG retrieves context first, then asks an LLM to answer.',
];

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: chunks,
});

const rows = response.data.map((item, index) => ({
  text: chunks[index],
  embedding: item.embedding,
}));

console.log(rows.length); // 3

Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts.

Start with chunkSize: 800

and chunkOverlap: 120

, then adjust based on your document style and answer quality.

import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 800,
  chunkOverlap: 120,
});

const docs = await splitter.createDocuments([
  `RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.`,
]);

console.log(docs.map((doc) => doc.pageContent));

Create a table with a vector column. text-embedding-3-small

outputs 1536 dimensions.

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS rag_chunks (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding VECTOR(1536) NOT NULL,
  source TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Insert chunk vectors from Node.js:

import pg from 'pg';
import pgvector from 'pgvector/pg';

const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
await pgvector.registerTypes(pool);

await pool.query(
  `INSERT INTO rag_chunks (content, embedding, source)
   VALUES ($1, $2, $3)`,
  ['Chunk content', pgvector.toSql(queryEmbedding), 'notes.md']
);

Embed the user question, then retrieve nearest chunks using cosine distance.

Lower distance

means a closer semantic match.

top-k

means how many nearest chunks you return (in this query, k=4

with LIMIT 4

You can also use a simple threshold (for example 0.4

) to discard weak matches.

As a starting point, many setups work well in the 0.35

to 0.45

range for cosine distance, then tune with real questions from your domain.

const searchResult = await pool.query(
  `SELECT id, content, source, embedding <=> $1::vector AS distance
   FROM rag_chunks
   ORDER BY embedding <=> $1::vector
   LIMIT 4`,
  [pgvector.toSql(queryEmbedding)]
);

const contextChunks = searchResult.rows.map((row) => row.content);

Threshold filtering example:

const DISTANCE_THRESHOLD = 0.4;
const filteredChunks = searchResult.rows
  .filter((row) => Number(row.distance) <= DISTANCE_THRESHOLD)
  .map((row) => row.content);

If no chunks pass the threshold, skip answer generation and return a fallback message:

if (filteredChunks.length === 0) {
  console.log('I do not have enough context to answer this.');
  process.exit(0);
}

Use retrieved chunks as grounded context for the final model call.

const context = contextChunks.join('\n\n---\n\n');

const answer = await client.responses.create({
  model: 'gpt-5.5',
  instructions:
    'Answer only from the provided context. If context is insufficient, respond with: I do not have enough context to answer this.',
  input: `Context:\n${context}\n\nQuestion: How does pgvector semantic search work?`,
});

console.log(answer.output_text);

Runnable scripts for this post live in the rag-openai-embeddings-pgvector-demo

folder in the private demos repository. Get access via code demos.

source & further reading

dev.to — original article AL-MUNAA: a collective immune system for AI agents 9 checks before you launch an AI-built web app We're Hiring a Foundational Engineer. The Company Is Fake. The Work Is Not.

~/api · this article 200

$curl api.wpnews.pro/v1/news/rag-with-openai-embeddin…

Read original on dev.to → dev.to/zsevic/rag-with-openai-embeddings-pgvecto…

mentioned entities

OpenAI

pgvector

PostgreSQL

LangChain

metadata

slugrag-with-openai-embeddings-pgvector-and-langchain

topic#large-language-models

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevHow to add Honeycomb traces to y…

next →Dense Retrievers Know More Than …

── more in #large-language-models 4 stories · sorted by recency

startupfortune.com · 18 Jul · #large-language-models

Zhipu's Stock Has Soared 1,500 Percent While Revenue Stayed Under $105 Million

dev.to · 18 Jul · #large-language-models

Retrieval-Augmented Self-Recall — Part 2: Hybrid RAG on Nothing but Postgres

dev.to · 17 Jul · #large-language-models

Vector Databases, Deep Indexing & Token Economics: The Complete Story (phase 3)

dev.to · 16 Jul · #large-language-models

RAG in Laravel: Embeddings and pgvector for a Knowledge-Base Bot

── more on @openai 3 stories trending now

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #large-language-models

Gemini 3.5 Pro Delayed to July 17: Architectural Rebuild Explained

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required