# RAG with OpenAI Embeddings, pgvector and LangChain

> Source: <https://dev.to/zsevic/rag-with-openai-embeddings-pgvector-and-langchain-2m0g>
> Published: 2026-06-02 22:25:21+00:00

Retrieval-Augmented Generation (RAG) is a practical pattern: store knowledge as embeddings, retrieve the most relevant chunks with semantic search, then generate an answer grounded in that context.

This guide shows a simple end-to-end flow with OpenAI embeddings, PostgreSQL + pgvector, and LangChain chunking.

`openai`

, `langchain`

, `pg`

, `pgvector`

Embeddings are numeric vectors that represent the semantic meaning of text. Similar text should produce vectors that are close in vector space.

In practice:

``` python
import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
```

Use a single string when embedding a user query.

``` js
const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'How do I connect pgvector to PostgreSQL?',
});

const queryEmbedding = response.data[0].embedding;
console.log(queryEmbedding.length);
```

Use an array to embed multiple chunks in one API call.

``` js
const chunks = [
  'pgvector adds vector similarity search to PostgreSQL.',
  'LangChain helps split long documents into retrieval-friendly chunks.',
  'RAG retrieves context first, then asks an LLM to answer.',
];

const response = await client.embeddings.create({
  model: 'text-embedding-3-small',
  input: chunks,
});

const rows = response.data.map((item, index) => ({
  text: chunks[index],
  embedding: item.embedding,
}));

console.log(rows.length); // 3
```

Chunking makes retrieval more precise. Instead of embedding one large document, split it into smaller overlapping parts.

Start with `chunkSize: 800`

and `chunkOverlap: 120`

, then adjust based on your document style and answer quality.

``` js
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 800,
  chunkOverlap: 120,
});

const docs = await splitter.createDocuments([
  `RAG combines retrieval and generation. Store chunks as vectors and fetch similar chunks at query time.`,
]);

console.log(docs.map((doc) => doc.pageContent));
```

Create a table with a vector column. `text-embedding-3-small`

outputs 1536 dimensions.

```
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS rag_chunks (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  embedding VECTOR(1536) NOT NULL,
  source TEXT,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
```

Insert chunk vectors from Node.js:

``` python
import pg from 'pg';
import pgvector from 'pgvector/pg';

const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
await pgvector.registerTypes(pool);

await pool.query(
  `INSERT INTO rag_chunks (content, embedding, source)
   VALUES ($1, $2, $3)`,
  ['Chunk content', pgvector.toSql(queryEmbedding), 'notes.md']
);
```

Embed the user question, then retrieve nearest chunks using cosine distance.

Lower `distance`

means a closer semantic match.

`top-k`

means how many nearest chunks you return (in this query, `k=4`

with `LIMIT 4`

).

You can also use a simple threshold (for example `0.4`

) to discard weak matches.

As a starting point, many setups work well in the `0.35`

to `0.45`

range for cosine distance, then tune with real questions from your domain.

``` js
const searchResult = await pool.query(
  `SELECT id, content, source, embedding <=> $1::vector AS distance
   FROM rag_chunks
   ORDER BY embedding <=> $1::vector
   LIMIT 4`,
  [pgvector.toSql(queryEmbedding)]
);

const contextChunks = searchResult.rows.map((row) => row.content);
```

Threshold filtering example:

``` js
const DISTANCE_THRESHOLD = 0.4;
const filteredChunks = searchResult.rows
  .filter((row) => Number(row.distance) <= DISTANCE_THRESHOLD)
  .map((row) => row.content);
```

If no chunks pass the threshold, skip answer generation and return a fallback message:

```
if (filteredChunks.length === 0) {
  console.log('I do not have enough context to answer this.');
  process.exit(0);
}
```

Use retrieved chunks as grounded context for the final model call.

``` js
const context = contextChunks.join('\n\n---\n\n');

const answer = await client.responses.create({
  model: 'gpt-5.5',
  instructions:
    'Answer only from the provided context. If context is insufficient, respond with: I do not have enough context to answer this.',
  input: `Context:\n${context}\n\nQuestion: How does pgvector semantic search work?`,
});

console.log(answer.output_text);
```

Runnable scripts for this post live in the `rag-openai-embeddings-pgvector-demo`

folder in the private demos repository. Get access via [code demos](https://sevic.dev/demos).
