Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.
This guide shows a simple end-to-end flow with OpenAI embeddings, PostgreSQL + pgvector, and LangChain chunking.
openai
, langchain
, pg
, pgvector
Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.
In practice:
import OpenAI from 'openai';
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
Use a single string when embedding a user query.
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: 'How do I connect pgvector to PostgreSQL?',
});
const queryEmbedding = response.data[0].embedding;
console.log(queryEmbedding.length);
Use an array to embed multiple chunks in one API call.
const chunks = [
'pgvector adds vector similarity search to PostgreSQL.',
'LangChain helps split long documents into retrieval-friendly chunks.',
'RAG retrieves context first, then asks an LLM to answer.',
];
const response = await client.embeddings.create({
model: 'text-embedding-3-small',
input: chunks,
});
const rows = response.data.map((item, index) => ({
text: chunks[index],
embedding: item.embedding,
}));
console.log(rows.length); // 3
Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts.
Start with chunkSize: 800
and chunkOverlap: 120
, then adjust based on your document style and answer quality.
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 800,
chunkOverlap: 120,
});
const docs = await splitter.createDocuments([
`RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.`,
]);
console.log(docs.map((doc) => doc.pageContent));
Create a table with a vector column. text-embedding-3-small
outputs 1536 dimensions.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE IF NOT EXISTS rag_chunks (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536) NOT NULL,
source TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Insert chunk vectors from Node.js:
import pg from 'pg';
import pgvector from 'pgvector/pg';
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
await pgvector.registerTypes(pool);
await pool.query(
`INSERT INTO rag_chunks (content, embedding, source)
VALUES ($1, $2, $3)`,
['Chunk content', pgvector.toSql(queryEmbedding), 'notes.md']
);
Embed the user question, then retrieve nearest chunks using cosine distance.
Lower distance
means a closer semantic match.
top-k
means how many nearest chunks you return (in this query, k=4
with LIMIT 4
).
You can also use a simple threshold (for example 0.4
) to discard weak matches.
As a starting point, many setups work well in the 0.35
to 0.45
range for cosine distance, then tune with real questions from your domain.
const searchResult = await pool.query(
`SELECT id, content, source, embedding <=> $1::vector AS distance
FROM rag_chunks
ORDER BY embedding <=> $1::vector
LIMIT 4`,
[pgvector.toSql(queryEmbedding)]
);
const contextChunks = searchResult.rows.map((row) => row.content);
Threshold filtering example:
const DISTANCE_THRESHOLD = 0.4;
const filteredChunks = searchResult.rows
.filter((row) => Number(row.distance) <= DISTANCE_THRESHOLD)
.map((row) => row.content);
If no chunks pass the threshold, skip answer generation and return a fallback message:
if (filteredChunks.length === 0) {
console.log('I do not have enough context to answer this.');
process.exit(0);
}
Use retrieved chunks as grounded context for the final model call.
const context = contextChunks.join('\n\n---\n\n');
const answer = await client.responses.create({
model: 'gpt-5.5',
instructions:
'Answer only from the provided context. If context is insufficient, respond with: I do not have enough context to answer this.',
input: `Context:\n${context}\n\nQuestion: How does pgvector semantic search work?`,
});
console.log(answer.output_text);
Runnable scripts for this post live in the rag-openai-embeddings-pgvector-demo
folder in the private demos repository. Get access via code demos.