cd /news/large-language-models/building-a-production-ready-rag-appl… · home topics large-language-models article
[ARTICLE · art-31624] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Building a Production-Ready RAG Application with LangChain, pgvector, and Gemini

A developer built a production-ready Retrieval-Augmented Generation (RAG) application using LangChain, pgvector, and Google's Gemini models. The application ingests PDF documents, splits them into chunks enriched with context, stores embeddings in PostgreSQL via pgvector, and queries them with Gemini to generate answers. Key learnings include handling PostgreSQL NUL character constraints and improving search relevance by prepending subject names to chunks.

read4 min views1 publishedJun 17, 2026

Retrieval-Augmented Generation (RAG) is a powerful pattern to build applications that can query, understand, and extract insights from your custom documents (like PDFs, resumes, and reports) by feeding them as context to Large Language Models (LLMs).

This guide walks you through building a complete RAG API step-by-step, explaining the architecture, code, and debugging learnings along the way.

A typical RAG pipeline is divided into two parts:

pgvector

extension.requirements.txt

Dependencies include FastAPI (API framework), LangChain (orchestration library), Google GenAI integration, and database drivers for PostgreSQL/pgvector.

fastapi
uvicorn
python-dotenv

langchain
langchain-community
langchain-postgres
langchain-google-genai
langchain-text-splitters

pypdf

psycopg[binary]
pgvector

.env

(Environment Variables) Store database credentials and the Google AI Studio API key.

DATABASE_URL=postgresql://postgres:postgres@localhost:5432/ragdb
GOOGLE_API_KEY=YOUR_GEMINI_API_KEY

app/config.py

Loads variables from .env

to make them accessible across modules.

from dotenv import load_dotenv
import os

load_dotenv()

GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
DATABASE_URL = os.getenv("DATABASE_URL")

app/database.py

Sets up the SQLAlchemy engine instance to connect to PostgreSQL.

from sqlalchemy import create_engine
from dotenv import load_dotenv
import os

load_dotenv()

engine = create_engine(
  os.getenv("DATABASE_URL")
)

app/vector_store.py

Instantiates the embeddings model (models/gemini-embedding-2

) and connects it to PostgreSQL via PGVector

to index and search embeddings.

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_postgres import PGVector
from config import DATABASE_URL

embeddings = GoogleGenerativeAIEmbeddings(
  model="models/gemini-embedding-2"
)

vector_store = PGVector(
  embeddings=embeddings,
  collection_name="financial_documents",
  connection=DATABASE_URL,
  use_jsonb=True,
)

app/ingest.py

This script reads the PDF, sanitizes the text, chunks it, enriches the chunks with metadata, and saves the vectors into the database.

[!NOTE]

PostgreSQL NUL constraint:Standard Python PDF s might parse special formatting as\x00

(NUL characters). Since PostgreSQL utilizes C-style null-terminated strings, attempting to write raw\x00

results in a write error. We explicitly remove them before chunking.

Context Enrichment:If chunking splits the document, text in the middle of pages may lack context (like the candidate's name). Prepending"Candidate: {title}"

to every chunk ensures search queries containing the subject name rank these chunks accurately.

from langchain_community.document_s import PyPDF
from langchain_text_splitters import RecursiveCharacterTextSplitter
from vector_store import vector_store

def ingest_pdf(pdf_path: str):
     = PyPDF(pdf_path)
    docs = .load()

    for doc in docs:
        doc.page_content = doc.page_content.replace("\x00", "")

    splitter = RecursiveCharacterTextSplitter(
      chunk_size=1000,
      chunk_overlap=200
    )
    chunks = splitter.split_documents(docs)

    for chunk in chunks:
        title = chunk.metadata.get("title") or "Aditya Kumar"
        chunk.page_content = f"Candidate: {title}\n{chunk.page_content}"

    vector_store.add_documents(documents=chunks)
    print(f"Stored {len(chunks)} chunks")

if __name__ == "__main__":
    ingest_pdf("documents/aditya_resume.pdf")

app/chat.py

Queries the database for matching chunks, constructs the prompt context, feeds it to the LLM (gemini-2.5-flash

), and compiles the source page metadata.

from langchain_google_genai import ChatGoogleGenerativeAI
from vector_store import vector_store

llm = ChatGoogleGenerativeAI(
  model="gemini-2.5-flash"
)

def ask_question(question: str):
    docs = vector_store.similarity_search(question, k=3)

    context = "\n\n".join(doc.page_content for doc in docs)

    prompt = f"""
    You are a resume assistant
    Answer ONLY from the provided context
    If the answer does not exist in the context say "I don't know".
    Context:{context}
    Question:{question}
    """

    response = llm.invoke(prompt)

    return {
        "answer": response.content,
        "source": [
            {
                "page": doc.metadata.get("page"),
                "source": doc.metadata.get("source")
            }
            for doc in docs
        ]
    }

app/main.py

Hosts the FastAPI server. It appends the current directory path dynamically to resolve imports cleanly if run from the root project directory.

import sys
import os
sys.path.append(os.path.dirname(os.path.abspath(__file__)))

from fastapi import FastAPI
from pydantic import BaseModel
from chat import ask_question

app = FastAPI()

class QuestionRequest(BaseModel):
    question: str

@app.get("/chat")
async def ask(request: QuestionRequest):
    return ask_question(request.question)

client.models.list()

).gemini-2.5-pro

on unpaid tiers can result in 429 RESOURCE_EXHAUSTED

(quota limit of 0). Switching to gemini-2.5-flash

provides a cost-effective, high-quota alternative.\x00

markers. When writing these raw strings to databases, PostgreSQL will fail. Implementing a simple .replace('\x00', '')

filter is mandatory."Where does Aditya Kumar work?"

, chunks containing "Aditya Kumar"

(like the footer/header) rank high, while relevant work history chunks lacking his name rank extremely low."Candidate: Aditya Kumar"

to each chunk) forces the system to find the correct chunk and enables accurate generation.

── more in #large-language-models 4 stories · sorted by recency
── more on @langchain 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/building-a-productio…] indexed:0 read:4min 2026-06-17 ·