{"slug": "building-a-rag-system-from-scratch-mcp-exposing-pgvector-as-a-reusable-tool", "title": "Building a RAG System from Scratch — MCP: Exposing pgvector as a Reusable Tool Server", "summary": "A developer built a reusable tool server using the Model Context Protocol (MCP) to expose pgvector search functions as a standalone server that any LLM client can connect to. The implementation uses FastMCP to wrap existing search tools—search_documents, search_by_category, and list_categories—making them accessible to clients like Claude Desktop and Gemini agents instead of being hardcoded in a single Python script.", "body_md": "In the [previous article](https://dev.to/hiroki-kameyama/building-a-rag-system-from-scratch-ai-agents-memory-planning-and-multi-step-reasoning-1kp9), we built AI Agents that autonomously search our pgvector database. One limitation remained: the tools were hardcoded inside our Python scripts. Only our code could use them.\n\n**MCP (Model Context Protocol)** fixes this. It turns our search functions into a standalone server that any LLM client can connect to — Claude Desktop, Gemini agents, or any future client.\n\n```\nTool Use (what we built):\n  Python script → hardcoded functions → Gemini API\n  Reusable by: this script only\n\nMCP Server (what we're building):\n  Any LLM client → MCP protocol → our server → pgvector\n  Reusable by: Claude Desktop, any agent, any language\n```\n\nThe tools themselves don't change. What changes is where they live and how they're accessed.\n\n| Primitive | Role | Our implementation |\n|---|---|---|\nTools |\nFunctions the LLM can call |\n`search_documents` , `search_by_category` , `list_categories`\n|\nResources |\nData the LLM can read |\n`db://categories` (category list) |\nPrompts |\nReusable prompt templates | `search_prompt(topic)` |\n\n```\npip install fastmcp\npip freeze > requirements.txt\n```\n\n`mcp_server/server.py`\n\n``` python\n# mcp_server/server.py\nimport psycopg2\nfrom google import genai\nfrom google.genai import types as genai_types\nfrom fastmcp import FastMCP\nfrom dotenv import load_dotenv\nimport os\n\nload_dotenv()\n\nmcp = FastMCP(\n    name=\"pgvector-search\",\n    instructions=\"Document search server using pgvector. \"\n                 \"Covers machine learning, Python, and cloud topics.\",\n)\n\ngemini_client = genai.Client(api_key=os.getenv(\"GEMINI_API_KEY\"))\n\nconn = psycopg2.connect(\n    host=os.getenv(\"DB_HOST\"), port=os.getenv(\"DB_PORT\"),\n    dbname=os.getenv(\"DB_NAME\"), user=os.getenv(\"DB_USER\"),\n    password=os.getenv(\"DB_PASSWORD\"),\n)\ncur = conn.cursor()\n\ndef get_embedding(text: str) -> list[float]:\n    result = gemini_client.models.embed_content(\n        model=\"gemini-embedding-001\",\n        contents=text,\n        config=genai_types.EmbedContentConfig(\n            task_type=\"RETRIEVAL_QUERY\",\n            output_dimensionality=768,\n        ),\n    )\n    return result.embeddings[0].values\n\n# ── Tools ─────────────────────────────────────────────────────\n# The @mcp.tool decorator replaces FunctionDeclaration(...) entirely.\n# Type hints + docstrings generate the schema automatically.\n\n@mcp.tool\ndef search_documents(query: str, top_k: int = 3) -> list[dict]:\n    \"\"\"\n    Search all document categories for a given query.\n    Use when the category is unknown or the question spans multiple categories.\n\n    Args:\n        query: Search query\n        top_k: Number of documents to retrieve (default: 3)\n    \"\"\"\n    q = get_embedding(query)\n    cur.execute(\"\"\"\n        SELECT title, body, category,\n               1 - (embedding <=> %s::vector) AS similarity\n        FROM documents ORDER BY embedding <=> %s::vector LIMIT %s;\n    \"\"\", (q, q, top_k))\n    return [\n        {\"title\": r[0], \"body\": r[1], \"category\": r[2], \"similarity\": round(r[3], 4)}\n        for r in cur.fetchall()\n    ]\n\n@mcp.tool\ndef search_by_category(query: str, category: str, top_k: int = 3) -> list[dict]:\n    \"\"\"\n    Search within a specific category (ML, Python, or Cloud).\n    Use when the category is explicitly mentioned in the question.\n\n    Args:\n        query: Search query\n        category: Category name — ML, Python, or Cloud\n        top_k: Number of documents to retrieve (default: 3)\n    \"\"\"\n    q = get_embedding(query)\n    cur.execute(\"\"\"\n        SELECT title, body, category,\n               1 - (embedding <=> %s::vector) AS similarity\n        FROM documents WHERE category = %s\n        ORDER BY embedding <=> %s::vector LIMIT %s;\n    \"\"\", (q, category, q, top_k))\n    return [\n        {\"title\": r[0], \"body\": r[1], \"category\": r[2], \"similarity\": round(r[3], 4)}\n        for r in cur.fetchall()\n    ]\n\n@mcp.tool\ndef list_categories() -> list[dict]:\n    \"\"\"\n    Return all available categories and their document counts.\n    Use this first to understand what data is available.\n    \"\"\"\n    cur.execute(\"\"\"\n        SELECT category, COUNT(*) as count\n        FROM documents GROUP BY category ORDER BY count DESC;\n    \"\"\")\n    return [{\"category\": r[0], \"count\": r[1]} for r in cur.fetchall()]\n\n# ── Resources ─────────────────────────────────────────────────\n# Resources are read-only data the LLM can access directly.\n\n@mcp.resource(\"db://categories\")\ndef get_categories_resource() -> str:\n    cur.execute(\"\"\"\n        SELECT category, COUNT(*) as count\n        FROM documents GROUP BY category ORDER BY count DESC;\n    \"\"\")\n    lines = [f\"- {r[0]}: {r[1]} documents\" for r in cur.fetchall()]\n    return \"Available categories:\\n\" + \"\\n\".join(lines)\n\n# ── Prompts ───────────────────────────────────────────────────\n# Reusable prompt templates.\n\n@mcp.prompt\ndef search_prompt(topic: str) -> str:\n    \"\"\"Generate a structured search prompt for a given topic.\"\"\"\n    return f\"\"\"Research the following topic using the available tools:\n\nTopic: {topic}\n\nSteps:\n1. Call list_categories to see what data is available\n2. If a relevant category exists, use search_by_category\n3. Otherwise use search_documents for a broad search\n4. Synthesize the results into a clear answer\"\"\"\n\n# ── Entry point ───────────────────────────────────────────────\nif __name__ == \"__main__\":\n    mcp.run()  # stdio mode — standard for Claude Desktop\nmkdir mcp_server\ntouch mcp_server/__init__.py\n```\n\n`mcp_server/client_test.py`\n\n``` python\n# mcp_server/client_test.py\nimport asyncio\nfrom fastmcp import Client\n\nasync def test_server():\n    async with Client(\"mcp_server/server.py\") as client:\n\n        # List available tools\n        tools = await client.list_tools()\n        print(\"=== Available tools ===\")\n        for tool in tools:\n            print(f\"  - {tool.name}: {tool.description[:50]}...\")\n\n        # List resources\n        resources = await client.list_resources()\n        print(\"\\n=== Available resources ===\")\n        for r in resources:\n            print(f\"  - {r.uri}\")\n\n        # Call a tool\n        print(\"\\n=== list_categories ===\")\n        result = await client.call_tool(\"list_categories\", {})\n        print(result)\n\n        print(\"\\n=== search_documents ===\")\n        result = await client.call_tool(\n            \"search_documents\",\n            {\"query\": \"ML evaluation metrics\", \"top_k\": 2}\n        )\n        print(result)\n\n        # Read a resource\n        print(\"\\n=== db://categories resource ===\")\n        content = await client.read_resource(\"db://categories\")\n        print(content)\n\nif __name__ == \"__main__\":\n    asyncio.run(test_server())\npython mcp_server/client_test.py\n# === Available tools ===\n#   - search_documents: Search all document categories for a given...\n#   - search_by_category: Search within a specific category...\n#   - list_categories: Return all available categories...\n#\n# === list_categories ===\n# [{'category': 'ML', 'count': 2}, {'category': 'Cloud', 'count': 2}, ...]\n```\n\n`12_mcp_agent.py`\n\nThe biggest difference: tool definitions come from the server, not from hardcoded `FunctionDeclaration`\n\nobjects.\n\n``` python\n# 12_mcp_agent.py\nimport asyncio\nfrom google import genai\nfrom google.genai import types\nfrom fastmcp import Client\nfrom dotenv import load_dotenv\nimport os\nimport time\n\nload_dotenv()\ngemini_client = genai.Client(api_key=os.getenv(\"GEMINI_API_KEY\"))\n\nasync def run_agent(task: str):\n    print(f\"\\nTask: {task}\")\n    print(\"=\" * 60)\n\n    async with Client(\"mcp_server/server.py\") as mcp_client:\n\n        # Fetch tool definitions from the server automatically\n        mcp_tools = await mcp_client.list_tools()\n\n        # Convert MCP tool definitions to Gemini format\n        gemini_tools = types.Tool(\n            function_declarations=[\n                types.FunctionDeclaration(\n                    name=tool.name,\n                    description=tool.description or \"\",\n                    parameters=types.Schema(\n                        type=types.Type.OBJECT,\n                        properties={\n                            name: types.Schema(\n                                type=types.Type.STRING\n                                if schema.get(\"type\") == \"string\"\n                                else types.Type.INTEGER\n                                if schema.get(\"type\") == \"integer\"\n                                else types.Type.STRING,\n                                description=schema.get(\"description\", \"\"),\n                            )\n                            for name, schema in\n                            (tool.inputSchema.get(\"properties\") or {}).items()\n                        },\n                        required=tool.inputSchema.get(\"required\", []),\n                    ),\n                )\n                for tool in mcp_tools\n            ]\n        )\n\n        print(f\"Loaded {len(mcp_tools)} tools from MCP server\")\n\n        contents = [types.Content(role=\"user\", parts=[types.Part(text=task)])]\n\n        for step in range(8):\n            print(f\"\\n[Step {step + 1}]\")\n\n            for attempt in range(5):\n                try:\n                    response = gemini_client.models.generate_content(\n                        model=\"gemini-2.5-flash\",\n                        contents=contents,\n                        config=types.GenerateContentConfig(tools=[gemini_tools]),\n                    )\n                    break\n                except Exception as e:\n                    if (\"503\" in str(e) or \"429\" in str(e)) and attempt < 4:\n                        time.sleep((attempt + 1) * 10)\n                    else:\n                        raise\n\n            candidates = response.candidates\n            if not candidates or not candidates[0].content.parts:\n                break\n\n            part = candidates[0].content.parts[0]\n\n            if part.function_call:\n                func_name = part.function_call.name\n                func_args = dict(part.function_call.args)\n                print(f\"  → {func_name}({func_args})\")\n\n                # Execute via MCP server instead of calling locally\n                result = await mcp_client.call_tool(func_name, func_args)\n                print(f\"  → {len(result) if isinstance(result, list) else result} results\")\n\n                contents.append(\n                    types.Content(role=\"model\", parts=[types.Part(function_call=part.function_call)])\n                )\n                contents.append(\n                    types.Content(\n                        role=\"user\",\n                        parts=[types.Part(\n                            function_response=types.FunctionResponse(\n                                name=func_name,\n                                response={\"result\": result},\n                            )\n                        )]\n                    )\n                )\n            else:\n                text_parts = [\n                    p.text for p in candidates[0].content.parts\n                    if hasattr(p, 'text') and p.text\n                ]\n                print(f\"\\n[Done in {step + 1} steps]\")\n                return \"\\n\".join(text_parts)\n\n    return \"Max steps reached.\"\n\nasync def main():\n    result = await run_agent(\n        \"Check the available categories, then explain ML evaluation metrics in detail.\"\n    )\n    print(f\"\\nFinal answer:\\n{result}\")\n\nif __name__ == \"__main__\":\n    asyncio.run(main())\npython 12_mcp_agent.py\n# Loaded 3 tools from MCP server\n# [Step 1]\n#   → list_categories({})\n# [Step 2]\n#   → search_by_category({'query': 'evaluation metrics', 'category': 'ML'})\n# [Done in 3 steps]\n```\n\nIf you have Claude Desktop installed, add this to `~/Library/Application Support/Claude/claude_desktop_config.json`\n\n:\n\n```\n{\n  \"mcpServers\": {\n    \"pgvector-search\": {\n      \"command\": \"/path/to/your/project/.venv/bin/python\",\n      \"args\": [\"/path/to/your/project/mcp_server/server.py\"],\n      \"env\": {\n        \"GEMINI_API_KEY\": \"AIza...\",\n        \"DB_HOST\": \"localhost\",\n        \"DB_PORT\": \"5432\",\n        \"DB_NAME\": \"vectordb\",\n        \"DB_USER\": \"postgres\",\n        \"DB_PASSWORD\": \"password\"\n      }\n    }\n  }\n}\n```\n\nRestart Claude Desktop. Now you can type \"search the pgvector DB for ML evaluation metrics\" directly in Claude's chat interface.\n\nNote:Use the full path to your`.venv/bin/python`\n\n, not just`python`\n\n. Claude Desktop doesn't activate virtual environments automatically.\n\nNote:Claude Desktop currently only supports stdio transport, not HTTP. Use`server.py`\n\n(not`server_http.py`\n\n) in the config.\n\n```\n# Tool Use — tools defined in code\ntools = types.Tool(function_declarations=[\n    types.FunctionDeclaration(name=\"search_documents\", ...)  # handwritten\n])\nresult = search_documents(query)  # called directly\n\n# MCP — tools fetched from server\nmcp_tools = await mcp_client.list_tools()    # fetched dynamically\nresult = await mcp_client.call_tool(name, args)  # executed on server\n```\n\nThe tools are identical. The difference is where they live. MCP makes them a shared infrastructure component rather than a per-project implementation.\n\nIn the final article of this series, we'll deploy the MCP server to Render and the pgvector database to Supabase — making everything accessible from anywhere.\n\n*Full source code: github.com/qameqame/pgvector-tutorial*", "url": "https://wpnews.pro/news/building-a-rag-system-from-scratch-mcp-exposing-pgvector-as-a-reusable-tool", "canonical_source": "https://dev.to/hiroki-kameyama/building-a-rag-system-from-scratch-mcp-exposing-pgvector-as-a-reusable-tool-server-2onc", "published_at": "2026-06-27 22:17:47+00:00", "updated_at": "2026-06-27 23:03:54.998005+00:00", "lang": "en", "topics": ["developer-tools", "large-language-models", "artificial-intelligence"], "entities": ["pgvector", "FastMCP", "Model Context Protocol", "Gemini", "Claude Desktop", "Python"], "alternates": {"html": "https://wpnews.pro/news/building-a-rag-system-from-scratch-mcp-exposing-pgvector-as-a-reusable-tool", "markdown": "https://wpnews.pro/news/building-a-rag-system-from-scratch-mcp-exposing-pgvector-as-a-reusable-tool.md", "text": "https://wpnews.pro/news/building-a-rag-system-from-scratch-mcp-exposing-pgvector-as-a-reusable-tool.txt", "jsonld": "https://wpnews.pro/news/building-a-rag-system-from-scratch-mcp-exposing-pgvector-as-a-reusable-tool.jsonld"}}