{"slug": "building-a-rag-system-from-scratch-wrap-up-and-what-comes-next", "title": "Building a RAG System from Scratch — Wrap-up and What Comes Next", "summary": "A developer built a complete RAG system from scratch using pgvector, Gemini embeddings, and MCP, covering setup, indexing, ingestion, search, and multi-step agents. The series concluded with deployment on Render and Supabase, and outlined next steps including evaluation, observability, security, and governance.", "body_md": "In this final article, we'll recap what we built across the series, consolidate the design decisions, and point to where to go next.\n\nStarting from a blank Python project, we built a complete AI system step by step:\n\n```\n01_setup_db.py       pgvector table + extension\n02_create_index.py   HNSW index (m=16, ef_construction=64)\n03_ingest.py         Embed documents → store in pgvector\n04_search.py         Cosine similarity search\n05_rag.py            Full RAG pipeline\n\n06_tool_basic.py     LLM decides whether to search\n07_tool_multi.py     LLM routes between multiple tools\n08_tool_agent.py     Multi-step agentic loop\n\n09_agent_basic.py    ReAct pattern\n10_agent_memory.py   Persistent memory across sessions\n11_agent_planner.py  Plan → Execute → Evaluate\n\nmcp_server/\n  server.py          MCP server (stdio, Claude Desktop)\n  server_http.py     MCP server (HTTP)\n  server_render.py   MCP server (Render deployment)\n\n12_mcp_agent.py      Agent via MCP (local)\n13_mcp_http_agent.py Agent via MCP (cloud)\n```\n\npgvector integrates with existing PostgreSQL, supports SQL + vector in one query, and handles millions of documents comfortably. Start here and migrate only when you have evidence you need to.\n\n`gemini-embedding-001`\n\noutputs 3072 dims by default, but pgvector's HNSW index has a 2000-dim hard limit. 768 dims stays well within bounds with negligible quality loss.\n\n`task_type`\n\nUse `RETRIEVAL_DOCUMENT`\n\nwhen storing, `RETRIEVAL_QUERY`\n\nwhen searching. The Gemini embedding model is trained to map queries *toward* documents, not to the same point. Using the same task type for both degrades retrieval accuracy.\n\nHNSW requires no training data, delivers consistent recall at scale, and is faster at query time. IVFFlat is only worth considering under tight memory constraints.\n\nThe LLM selects tools based on their `description`\n\nfield. Precise, distinguishing descriptions produce correct tool selection. Vague descriptions produce random behavior.\n\nEach tool call and result gets appended to `contents`\n\n. The LLM reads the full history on every step — this is how multi-step reasoning works.\n\nMCP turns hardcoded functions into a standalone server. Claude Desktop, Gemini agents, and any future client can connect to the same server without duplicating tool definitions.\n\nRender's free web service hosts the MCP server. Supabase's free tier hosts pgvector. The Connection Pooler (port 6543) is mandatory — Render doesn't support the IPv6 used by Supabase's standard port 5432.\n\n```\nLocal:\n  Claude Desktop\n      ↓ stdio\n  mcp_server/server.py\n      ↓ psycopg2\n  pgvector (Docker)\n\nCloud:\n  Python agent (13_mcp_http_agent.py)\n      ↓ HTTPS\n  Render (server_render.py)\n      ↓ PostgreSQL + SSL (port 6543)\n  Supabase (pgvector)\n      ↓\n  Gemini Embedding + LLM\n```\n\nThis series focused on getting a production-ready RAG system off the ground. Several important topics are out of scope here:\n\n**Evaluation (Evals)** — How do you know if your RAG is actually working? You need automated quality measurement: Context Recall, Answer Relevancy, and Faithfulness scoring.\n\n**Observability** — When something goes wrong in production, how do you debug it? Tracing each step with a tool like Langfuse tells you exactly where latency or quality issues originate.\n\n**Security** — How do you handle adversarial inputs? Prompt injection, jailbreaks, and PII leakage are real threats in any public-facing RAG system.\n\n**MLOps / LLMOps** — How do you ship changes safely? Prompt versioning, CI/CD quality gates, and API cost tracking become essential when the system is in production.\n\n**Fine-tuning** — When the base model doesn't behave the way you need, LoRA fine-tuning lets you adapt it to your domain with surprisingly little data and compute.\n\n**Multi-Agent Systems** — When a single agent isn't enough, orchestrator-worker patterns distribute work across specialized agents.\n\n**Governance** — The EU AI Act is now fully in force. Compliance for a chatbot system means AI disclosure notices, audit logging, and a documented risk assessment.\n\nAll of these are covered in **Vol.2** of this series.\n\nThe second series picks up where this one leaves off — taking a working RAG system and making it production-grade.\n\n[AI Production Operations Guide](https://zenn.dev/hkame/books/ai-architect-production)*(Japanese — English Dev.to series coming soon)*\n\n| Chapter | Topic |\n|---|---|\n| 1 | What \"production\" actually means |\n| 2 | Evals — automated quality measurement |\n| 3 | Observability with Langfuse v4 |\n| 4 | Security — guardrails and prompt injection defense |\n| 5 | MLOps / LLMOps — CI/CD pipeline |\n| 6 | Fine-tuning with LoRA |\n| 7 | Multi-Agent: orchestrator-worker pattern |\n| 8 | Governance — EU AI Act compliance |\n| 9 | Wrap-up |\n\nEverything built in this series is in one repository:\n\n[github.com/qameqame/pgvector-tutorial](https://github.com/qameqame/pgvector-tutorial)\n\nThe README covers setup, directory structure, and the reasoning behind each design decision.\n\nThanks for following along. If you found this useful, the GitHub repo and Vol.2 are the best places to continue.", "url": "https://wpnews.pro/news/building-a-rag-system-from-scratch-wrap-up-and-what-comes-next", "canonical_source": "https://dev.to/hiroki-kameyama/building-a-rag-system-from-scratch-wrap-up-and-what-comes-next-2821", "published_at": "2026-06-27 22:21:00+00:00", "updated_at": "2026-06-27 22:35:47.377824+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "developer-tools", "ai-infrastructure"], "entities": ["pgvector", "Gemini", "MCP", "Claude Desktop", "Render", "Supabase", "Langfuse", "EU AI Act"], "alternates": {"html": "https://wpnews.pro/news/building-a-rag-system-from-scratch-wrap-up-and-what-comes-next", "markdown": "https://wpnews.pro/news/building-a-rag-system-from-scratch-wrap-up-and-what-comes-next.md", "text": "https://wpnews.pro/news/building-a-rag-system-from-scratch-wrap-up-and-what-comes-next.txt", "jsonld": "https://wpnews.pro/news/building-a-rag-system-from-scratch-wrap-up-and-what-comes-next.jsonld"}}