{"slug": "building-a-production-ready-rag-application-with-langchain-pgvector-and-gemini", "title": "Building a Production-Ready RAG Application with LangChain, pgvector, and Gemini", "summary": "A developer built a production-ready Retrieval-Augmented Generation (RAG) application using LangChain, pgvector, and Google's Gemini models. The application ingests PDF documents, splits them into chunks enriched with context, stores embeddings in PostgreSQL via pgvector, and queries them with Gemini to generate answers. Key learnings include handling PostgreSQL NUL character constraints and improving search relevance by prepending subject names to chunks.", "body_md": "Retrieval-Augmented Generation (RAG) is a powerful pattern to build applications that can query, understand, and extract insights from your custom documents (like PDFs, resumes, and reports) by feeding them as context to Large Language Models (LLMs).\n\nThis guide walks you through building a complete RAG API step-by-step, explaining the architecture, code, and debugging learnings along the way.\n\nA typical RAG pipeline is divided into two parts:\n\n`pgvector`\n\nextension.`requirements.txt`\n\nDependencies include FastAPI (API framework), LangChain (orchestration library), Google GenAI integration, and database drivers for PostgreSQL/pgvector.\n\n```\nfastapi\nuvicorn\npython-dotenv\n\nlangchain\nlangchain-community\nlangchain-postgres\nlangchain-google-genai\nlangchain-text-splitters\n\npypdf\n\npsycopg[binary]\npgvector\n```\n\n`.env`\n\n(Environment Variables)\nStore database credentials and the Google AI Studio API key.\n\n```\nDATABASE_URL=postgresql://postgres:postgres@localhost:5432/ragdb\nGOOGLE_API_KEY=YOUR_GEMINI_API_KEY\n```\n\n`app/config.py`\n\nLoads variables from `.env`\n\nto make them accessible across modules.\n\n``` python\nfrom dotenv import load_dotenv\nimport os\n\nload_dotenv()\n\nGOOGLE_API_KEY = os.getenv(\"GOOGLE_API_KEY\")\nDATABASE_URL = os.getenv(\"DATABASE_URL\")\n```\n\n`app/database.py`\n\nSets up the SQLAlchemy engine instance to connect to PostgreSQL.\n\n``` python\nfrom sqlalchemy import create_engine\nfrom dotenv import load_dotenv\nimport os\n\nload_dotenv()\n\nengine = create_engine(\n  os.getenv(\"DATABASE_URL\")\n)\n```\n\n`app/vector_store.py`\n\nInstantiates the embeddings model (`models/gemini-embedding-2`\n\n) and connects it to PostgreSQL via `PGVector`\n\nto index and search embeddings.\n\n``` python\nfrom langchain_google_genai import GoogleGenerativeAIEmbeddings\nfrom langchain_postgres import PGVector\nfrom config import DATABASE_URL\n\n# Set up the embeddings generator\nembeddings = GoogleGenerativeAIEmbeddings(\n  model=\"models/gemini-embedding-2\"\n)\n\n# Connect embeddings to PostgreSQL collection\nvector_store = PGVector(\n  embeddings=embeddings,\n  collection_name=\"financial_documents\",\n  connection=DATABASE_URL,\n  use_jsonb=True,\n)\n```\n\n`app/ingest.py`\n\nThis script reads the PDF, sanitizes the text, chunks it, enriches the chunks with metadata, and saves the vectors into the database.\n\n[!NOTE]\n\nPostgreSQL NUL constraint:Standard Python PDF loaders might parse special formatting as`\\x00`\n\n(NUL characters). Since PostgreSQL utilizes C-style null-terminated strings, attempting to write raw`\\x00`\n\nresults in a write error. We explicitly remove them before chunking.\n\nContext Enrichment:If chunking splits the document, text in the middle of pages may lack context (like the candidate's name). Prepending`\"Candidate: {title}\"`\n\nto every chunk ensures search queries containing the subject name rank these chunks accurately.\n\n``` python\nfrom langchain_community.document_loaders import PyPDFLoader\nfrom langchain_text_splitters import RecursiveCharacterTextSplitter\nfrom vector_store import vector_store\n\ndef ingest_pdf(pdf_path: str):\n    # 1. Load document\n    loader = PyPDFLoader(pdf_path)\n    docs = loader.load()\n\n    # 2. Sanitize null bytes (\\x00) which PostgreSQL does not support\n    for doc in docs:\n        doc.page_content = doc.page_content.replace(\"\\x00\", \"\")\n\n    # 3. Chunk the document\n    splitter = RecursiveCharacterTextSplitter(\n      chunk_size=1000,\n      chunk_overlap=200\n    )\n    chunks = splitter.split_documents(docs)\n\n    # 4. Context Enrichment\n    for chunk in chunks:\n        title = chunk.metadata.get(\"title\") or \"Aditya Kumar\"\n        chunk.page_content = f\"Candidate: {title}\\n{chunk.page_content}\"\n\n    # 5. Insert into pgvector\n    vector_store.add_documents(documents=chunks)\n    print(f\"Stored {len(chunks)} chunks\")\n\nif __name__ == \"__main__\":\n    ingest_pdf(\"documents/aditya_resume.pdf\")\n```\n\n`app/chat.py`\n\nQueries the database for matching chunks, constructs the prompt context, feeds it to the LLM (`gemini-2.5-flash`\n\n), and compiles the source page metadata.\n\n``` python\nfrom langchain_google_genai import ChatGoogleGenerativeAI\nfrom vector_store import vector_store\n\n# Initialize Chat Model\nllm = ChatGoogleGenerativeAI(\n  model=\"gemini-2.5-flash\"\n)\n\ndef ask_question(question: str):\n    # 1. Query vector database for top-3 most similar chunks\n    docs = vector_store.similarity_search(question, k=3)\n\n    # 2. Combine chunk text contents into single context block\n    context = \"\\n\\n\".join(doc.page_content for doc in docs)\n\n    # 3. Prompt instructions enforcing zero-shot constraints\n    prompt = f\"\"\"\n    You are a resume assistant\n    Answer ONLY from the provided context\n    If the answer does not exist in the context say \"I don't know\".\n    Context:{context}\n    Question:{question}\n    \"\"\"\n\n    # 4. Request generation from LLM\n    response = llm.invoke(prompt)\n\n    return {\n        \"answer\": response.content,\n        \"source\": [\n            {\n                \"page\": doc.metadata.get(\"page\"),\n                \"source\": doc.metadata.get(\"source\")\n            }\n            for doc in docs\n        ]\n    }\n```\n\n`app/main.py`\n\nHosts the FastAPI server. It appends the current directory path dynamically to resolve imports cleanly if run from the root project directory.\n\n``` python\nimport sys\nimport os\n# Ensure the root directory imports resolve correctly\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\n\nfrom fastapi import FastAPI\nfrom pydantic import BaseModel\nfrom chat import ask_question\n\napp = FastAPI()\n\nclass QuestionRequest(BaseModel):\n    question: str\n\n@app.get(\"/chat\")\nasync def ask(request: QuestionRequest):\n    return ask_question(request.question)\n```\n\n`client.models.list()`\n\n).`gemini-2.5-pro`\n\non unpaid tiers can result in `429 RESOURCE_EXHAUSTED`\n\n(quota limit of 0). Switching to `gemini-2.5-flash`\n\nprovides a cost-effective, high-quota alternative.`\\x00`\n\nmarkers. When writing these raw strings to databases, PostgreSQL will fail. Implementing a simple `.replace('\\x00', '')`\n\nfilter is mandatory.`\"Where does Aditya Kumar work?\"`\n\n, chunks containing `\"Aditya Kumar\"`\n\n(like the footer/header) rank high, while relevant work history chunks lacking his name rank extremely low.`\"Candidate: Aditya Kumar\"`\n\nto each chunk) forces the system to find the correct chunk and enables accurate generation.", "url": "https://wpnews.pro/news/building-a-production-ready-rag-application-with-langchain-pgvector-and-gemini", "canonical_source": "https://dev.to/adityakmr7/building-a-production-ready-rag-application-with-langchain-pgvector-and-gemini-n25", "published_at": "2026-06-17 18:59:52+00:00", "updated_at": "2026-06-17 19:21:24.751043+00:00", "lang": "en", "topics": ["large-language-models", "generative-ai", "developer-tools", "natural-language-processing", "ai-products"], "entities": ["LangChain", "pgvector", "Gemini", "Google", "PostgreSQL", "FastAPI", "SQLAlchemy", "PyPDFLoader"], "alternates": {"html": "https://wpnews.pro/news/building-a-production-ready-rag-application-with-langchain-pgvector-and-gemini", "markdown": "https://wpnews.pro/news/building-a-production-ready-rag-application-with-langchain-pgvector-and-gemini.md", "text": "https://wpnews.pro/news/building-a-production-ready-rag-application-with-langchain-pgvector-and-gemini.txt", "jsonld": "https://wpnews.pro/news/building-a-production-ready-rag-application-with-langchain-pgvector-and-gemini.jsonld"}}