Building a Production-Ready RAG Application with LangChain, pgvector, and Gemini

wpnews.pro

cd /news/large-language-models/building-a-production-ready-rag-appl… · home › topics › large-language-models › article

[ARTICLE · art-31624] src=dev.to ↗ pub=2026-06-17T18:59Z topic=large-language-models verified=true sentiment=↑ positive

Building a Production-Ready RAG Application with LangChain, pgvector, and Gemini

A developer built a production-ready Retrieval-Augmented Generation (RAG) application using LangChain, pgvector, and Google's Gemini models. The application ingests PDF documents, splits them into chunks enriched with context, stores embeddings in PostgreSQL via pgvector, and queries them with Gemini to generate answers. Key learnings include handling PostgreSQL NUL character constraints and improving search relevance by prepending subject names to chunks.

read4 min views35 publishedJun 17, 2026

Retrieval-Augmented Generation (RAG) is a powerful pattern to build applications that can query, understand, and extract insights from your custom documents (like PDFs, resumes, and reports) by feeding them as context to Large Language Models (LLMs).

This guide walks you through building a complete RAG API step-by-step, explaining the architecture, code, and debugging learnings along the way.

A typical RAG pipeline is divided into two parts:

pgvector

extension.requirements.txt

Dependencies include FastAPI (API framework), LangChain (orchestration library), Google GenAI integration, and database drivers for PostgreSQL/pgvector.

fastapi
uvicorn
python-dotenv

langchain
langchain-community
langchain-postgres
langchain-google-genai
langchain-text-splitters

pypdf

psycopg[binary]
pgvector

.env

(Environment Variables) Store database credentials and the Google AI Studio API key.

DATABASE_URL=postgresql://postgres:postgres@localhost:5432/ragdb
GOOGLE_API_KEY=YOUR_GEMINI_API_KEY

app/config.py

Loads variables from .env

to make them accessible across modules.

from dotenv import load_dotenv
import os

load_dotenv()

GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
DATABASE_URL = os.getenv("DATABASE_URL")

app/database.py

Sets up the SQLAlchemy engine instance to connect to PostgreSQL.

from sqlalchemy import create_engine
from dotenv import load_dotenv
import os

load_dotenv()

engine = create_engine(
  os.getenv("DATABASE_URL")
)

app/vector_store.py

Instantiates the embeddings model (models/gemini-embedding-2

) and connects it to PostgreSQL via PGVector

to index and search embeddings.

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_postgres import PGVector
from config import DATABASE_URL

embeddings = GoogleGenerativeAIEmbeddings(
  model="models/gemini-embedding-2"
)

vector_store = PGVector(
  embeddings=embeddings,
  collection_name="financial_documents",
  connection=DATABASE_URL,
  use_jsonb=True,
)

app/ingest.py

This script reads the PDF, sanitizes the text, chunks it, enriches the chunks with metadata, and saves the vectors into the database.

[!NOTE]

PostgreSQL NUL constraint:Standard Python PDF s might parse special formatting as\x00

(NUL characters). Since PostgreSQL utilizes C-style null-terminated strings, attempting to write raw\x00

results in a write error. We explicitly remove them before chunking.

Context Enrichment:If chunking splits the document, text in the middle of pages may lack context (like the candidate's name). Prepending"Candidate: {title}"

to every chunk ensures search queries containing the subject name rank these chunks accurately.

from langchain_community.document_s import PyPDF
from langchain_text_splitters import RecursiveCharacterTextSplitter
from vector_store import vector_store

def ingest_pdf(pdf_path: str):
     = PyPDF(pdf_path)
    docs = .load()

    for doc in docs:
        doc.page_content = doc.page_content.replace("\x00", "")

    splitter = RecursiveCharacterTextSplitter(
      chunk_size=1000,
      chunk_overlap=200
    )
    chunks = splitter.split_documents(docs)

    for chunk in chunks:
        title = chunk.metadata.get("title") or "Aditya Kumar"
        chunk.page_content = f"Candidate: {title}\n{chunk.page_content}"

    vector_store.add_documents(documents=chunks)
    print(f"Stored {len(chunks)} chunks")

if __name__ == "__main__":
    ingest_pdf("documents/aditya_resume.pdf")

app/chat.py

Queries the database for matching chunks, constructs the prompt context, feeds it to the LLM (gemini-2.5-flash

), and compiles the source page metadata.

from langchain_google_genai import ChatGoogleGenerativeAI
from vector_store import vector_store

llm = ChatGoogleGenerativeAI(
  model="gemini-2.5-flash"
)

def ask_question(question: str):
    docs = vector_store.similarity_search(question, k=3)

    context = "\n\n".join(doc.page_content for doc in docs)

    prompt = f"""
    You are a resume assistant
    Answer ONLY from the provided context
    If the answer does not exist in the context say "I don't know".
    Context:{context}
    Question:{question}
    """

    response = llm.invoke(prompt)

    return {
        "answer": response.content,
        "source": [
            {
                "page": doc.metadata.get("page"),
                "source": doc.metadata.get("source")
            }
            for doc in docs
        ]
    }

app/main.py

Hosts the FastAPI server. It appends the current directory path dynamically to resolve imports cleanly if run from the root project directory.

import sys
import os
sys.path.append(os.path.dirname(os.path.abspath(__file__)))

from fastapi import FastAPI
from pydantic import BaseModel
from chat import ask_question

app = FastAPI()

class QuestionRequest(BaseModel):
    question: str

@app.get("/chat")
async def ask(request: QuestionRequest):
    return ask_question(request.question)

client.models.list()

).gemini-2.5-pro

on unpaid tiers can result in 429 RESOURCE_EXHAUSTED

(quota limit of 0). Switching to gemini-2.5-flash

provides a cost-effective, high-quota alternative.\x00

markers. When writing these raw strings to databases, PostgreSQL will fail. Implementing a simple .replace('\x00', '')

filter is mandatory."Where does Aditya Kumar work?"

, chunks containing "Aditya Kumar"

(like the footer/header) rank high, while relevant work history chunks lacking his name rank extremely low."Candidate: Aditya Kumar"

to each chunk) forces the system to find the correct chunk and enables accurate generation.

source & further reading

dev.to — original article What “Team Humanity” Could Signal for OpenAI Governance and Enterprise AI Planning AI Makes Bad Developers Faster Too Stop Leaking Secrets into your LLM Context Windows

~/api · this article 200

$curl api.wpnews.pro/v1/news/building-a-production-re…

Read original on dev.to → dev.to/adityakmr7/building-a-production-ready-ra…

mentioned entities

LangChain

pgvector

Gemini

Google

PostgreSQL

FastAPI

SQLAlchemy

PyPDFLoader

metadata

slugbuilding-a-production-ready-rag-application-with-langchain-pgvector-and-gemini

topic#large-language-models

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevWhy Your Search Bar Understands …

next →Anthropic's design assistant now…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 2 Aug · #large-language-models

Lifecycle, DevOps & Multi-Agent Orchestration for Enterprise AI

byteiota.com · 2 Aug · #large-language-models

VS Code 1.131: See Your Subagents, Speak Your Code

letsdatascience.com · 2 Aug · #large-language-models

Katy ISD Restricts Student Generative AI Use

insideai.news · 2 Aug · #large-language-models

Reddit CEO Slams Google AI Overviews for Failing Web Ecosystem

── more on @langchain 3 stories trending now

wpnews · 1 Aug · #ai-products

OpenAI Atlas Shuts Down August 9: Migration Guide

wpnews · 1 Aug · #ai-agents

Quality Isn't Accidental — Maker/Checker Separation and Automated Validation

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required