# Building a Production-Ready RAG Application with LangChain, pgvector, and Gemini

> Source: <https://dev.to/adityakmr7/building-a-production-ready-rag-application-with-langchain-pgvector-and-gemini-n25>
> Published: 2026-06-17 18:59:52+00:00

Retrieval-Augmented Generation (RAG) is a powerful pattern to build applications that can query, understand, and extract insights from your custom documents (like PDFs, resumes, and reports) by feeding them as context to Large Language Models (LLMs).

This guide walks you through building a complete RAG API step-by-step, explaining the architecture, code, and debugging learnings along the way.

A typical RAG pipeline is divided into two parts:

`pgvector`

extension.`requirements.txt`

Dependencies include FastAPI (API framework), LangChain (orchestration library), Google GenAI integration, and database drivers for PostgreSQL/pgvector.

```
fastapi
uvicorn
python-dotenv

langchain
langchain-community
langchain-postgres
langchain-google-genai
langchain-text-splitters

pypdf

psycopg[binary]
pgvector
```

`.env`

(Environment Variables)
Store database credentials and the Google AI Studio API key.

```
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/ragdb
GOOGLE_API_KEY=YOUR_GEMINI_API_KEY
```

`app/config.py`

Loads variables from `.env`

to make them accessible across modules.

``` python
from dotenv import load_dotenv
import os

load_dotenv()

GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
DATABASE_URL = os.getenv("DATABASE_URL")
```

`app/database.py`

Sets up the SQLAlchemy engine instance to connect to PostgreSQL.

``` python
from sqlalchemy import create_engine
from dotenv import load_dotenv
import os

load_dotenv()

engine = create_engine(
  os.getenv("DATABASE_URL")
)
```

`app/vector_store.py`

Instantiates the embeddings model (`models/gemini-embedding-2`

) and connects it to PostgreSQL via `PGVector`

to index and search embeddings.

``` python
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_postgres import PGVector
from config import DATABASE_URL

# Set up the embeddings generator
embeddings = GoogleGenerativeAIEmbeddings(
  model="models/gemini-embedding-2"
)

# Connect embeddings to PostgreSQL collection
vector_store = PGVector(
  embeddings=embeddings,
  collection_name="financial_documents",
  connection=DATABASE_URL,
  use_jsonb=True,
)
```

`app/ingest.py`

This script reads the PDF, sanitizes the text, chunks it, enriches the chunks with metadata, and saves the vectors into the database.

[!NOTE]

PostgreSQL NUL constraint:Standard Python PDF loaders might parse special formatting as`\x00`

(NUL characters). Since PostgreSQL utilizes C-style null-terminated strings, attempting to write raw`\x00`

results in a write error. We explicitly remove them before chunking.

Context Enrichment:If chunking splits the document, text in the middle of pages may lack context (like the candidate's name). Prepending`"Candidate: {title}"`

to every chunk ensures search queries containing the subject name rank these chunks accurately.

``` python
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from vector_store import vector_store

def ingest_pdf(pdf_path: str):
    # 1. Load document
    loader = PyPDFLoader(pdf_path)
    docs = loader.load()

    # 2. Sanitize null bytes (\x00) which PostgreSQL does not support
    for doc in docs:
        doc.page_content = doc.page_content.replace("\x00", "")

    # 3. Chunk the document
    splitter = RecursiveCharacterTextSplitter(
      chunk_size=1000,
      chunk_overlap=200
    )
    chunks = splitter.split_documents(docs)

    # 4. Context Enrichment
    for chunk in chunks:
        title = chunk.metadata.get("title") or "Aditya Kumar"
        chunk.page_content = f"Candidate: {title}\n{chunk.page_content}"

    # 5. Insert into pgvector
    vector_store.add_documents(documents=chunks)
    print(f"Stored {len(chunks)} chunks")

if __name__ == "__main__":
    ingest_pdf("documents/aditya_resume.pdf")
```

`app/chat.py`

Queries the database for matching chunks, constructs the prompt context, feeds it to the LLM (`gemini-2.5-flash`

), and compiles the source page metadata.

``` python
from langchain_google_genai import ChatGoogleGenerativeAI
from vector_store import vector_store

# Initialize Chat Model
llm = ChatGoogleGenerativeAI(
  model="gemini-2.5-flash"
)

def ask_question(question: str):
    # 1. Query vector database for top-3 most similar chunks
    docs = vector_store.similarity_search(question, k=3)

    # 2. Combine chunk text contents into single context block
    context = "\n\n".join(doc.page_content for doc in docs)

    # 3. Prompt instructions enforcing zero-shot constraints
    prompt = f"""
    You are a resume assistant
    Answer ONLY from the provided context
    If the answer does not exist in the context say "I don't know".
    Context:{context}
    Question:{question}
    """

    # 4. Request generation from LLM
    response = llm.invoke(prompt)

    return {
        "answer": response.content,
        "source": [
            {
                "page": doc.metadata.get("page"),
                "source": doc.metadata.get("source")
            }
            for doc in docs
        ]
    }
```

`app/main.py`

Hosts the FastAPI server. It appends the current directory path dynamically to resolve imports cleanly if run from the root project directory.

``` python
import sys
import os
# Ensure the root directory imports resolve correctly
sys.path.append(os.path.dirname(os.path.abspath(__file__)))

from fastapi import FastAPI
from pydantic import BaseModel
from chat import ask_question

app = FastAPI()

class QuestionRequest(BaseModel):
    question: str

@app.get("/chat")
async def ask(request: QuestionRequest):
    return ask_question(request.question)
```

`client.models.list()`

).`gemini-2.5-pro`

on unpaid tiers can result in `429 RESOURCE_EXHAUSTED`

(quota limit of 0). Switching to `gemini-2.5-flash`

provides a cost-effective, high-quota alternative.`\x00`

markers. When writing these raw strings to databases, PostgreSQL will fail. Implementing a simple `.replace('\x00', '')`

filter is mandatory.`"Where does Aditya Kumar work?"`

, chunks containing `"Aditya Kumar"`

(like the footer/header) rank high, while relevant work history chunks lacking his name rank extremely low.`"Candidate: Aditya Kumar"`

to each chunk) forces the system to find the correct chunk and enables accurate generation.