Building a RAG System from Scratch — MCP: Exposing pgvector as a Reusable Tool Server

A developer built a reusable tool server using the Model Context Protocol (MCP) to expose pgvector search functions as a standalone server that any LLM client can connect to. The implementation uses FastMCP to wrap existing search tools—search_documents, search_by_category, and list_categories—making them accessible to clients like Claude Desktop and Gemini agents instead of being hardcoded in a single Python script.

In the previous article https://dev.to/hiroki-kameyama/building-a-rag-system-from-scratch-ai-agents-memory-planning-and-multi-step-reasoning-1kp9 , we built AI Agents that autonomously search our pgvector database. One limitation remained: the tools were hardcoded inside our Python scripts. Only our code could use them. MCP Model Context Protocol fixes this. It turns our search functions into a standalone server that any LLM client can connect to — Claude Desktop, Gemini agents, or any future client. Tool Use what we built : Python script → hardcoded functions → Gemini API Reusable by: this script only MCP Server what we're building : Any LLM client → MCP protocol → our server → pgvector Reusable by: Claude Desktop, any agent, any language The tools themselves don't change. What changes is where they live and how they're accessed. | Primitive | Role | Our implementation | |---|---|---| Tools | Functions the LLM can call | search documents , search by category , list categories | Resources | Data the LLM can read | db://categories category list | Prompts | Reusable prompt templates | search prompt topic | pip install fastmcp pip freeze requirements.txt mcp server/server.py python mcp server/server.py import psycopg2 from google import genai from google.genai import types as genai types from fastmcp import FastMCP from dotenv import load dotenv import os load dotenv mcp = FastMCP name="pgvector-search", instructions="Document search server using pgvector. " "Covers machine learning, Python, and cloud topics.", gemini client = genai.Client api key=os.getenv "GEMINI API KEY" conn = psycopg2.connect host=os.getenv "DB HOST" , port=os.getenv "DB PORT" , dbname=os.getenv "DB NAME" , user=os.getenv "DB USER" , password=os.getenv "DB PASSWORD" , cur = conn.cursor def get embedding text: str - list float : result = gemini client.models.embed content model="gemini-embedding-001", contents=text, config=genai types.EmbedContentConfig task type="RETRIEVAL QUERY", output dimensionality=768, , return result.embeddings 0 .values ── Tools ───────────────────────────────────────────────────── The @mcp.tool decorator replaces FunctionDeclaration ... entirely. Type hints + docstrings generate the schema automatically. @mcp.tool def search documents query: str, top k: int = 3 - list dict : """ Search all document categories for a given query. Use when the category is unknown or the question spans multiple categories. Args: query: Search query top k: Number of documents to retrieve default: 3 """ q = get embedding query cur.execute """ SELECT title, body, category, 1 - embedding <= %s::vector AS similarity FROM documents ORDER BY embedding <= %s::vector LIMIT %s; """, q, q, top k return {"title": r 0 , "body": r 1 , "category": r 2 , "similarity": round r 3 , 4 } for r in cur.fetchall @mcp.tool def search by category query: str, category: str, top k: int = 3 - list dict : """ Search within a specific category ML, Python, or Cloud . Use when the category is explicitly mentioned in the question. Args: query: Search query category: Category name — ML, Python, or Cloud top k: Number of documents to retrieve default: 3 """ q = get embedding query cur.execute """ SELECT title, body, category, 1 - embedding <= %s::vector AS similarity FROM documents WHERE category = %s ORDER BY embedding <= %s::vector LIMIT %s; """, q, category, q, top k return {"title": r 0 , "body": r 1 , "category": r 2 , "similarity": round r 3 , 4 } for r in cur.fetchall @mcp.tool def list categories - list dict : """ Return all available categories and their document counts. Use this first to understand what data is available. """ cur.execute """ SELECT category, COUNT as count FROM documents GROUP BY category ORDER BY count DESC; """ return {"category": r 0 , "count": r 1 } for r in cur.fetchall ── Resources ───────────────────────────────────────────────── Resources are read-only data the LLM can access directly. @mcp.resource "db://categories" def get categories resource - str: cur.execute """ SELECT category, COUNT as count FROM documents GROUP BY category ORDER BY count DESC; """ lines = f"- {r 0 }: {r 1 } documents" for r in cur.fetchall return "Available categories:\n" + "\n".join lines ── Prompts ─────────────────────────────────────────────────── Reusable prompt templates. @mcp.prompt def search prompt topic: str - str: """Generate a structured search prompt for a given topic.""" return f"""Research the following topic using the available tools: Topic: {topic} Steps: 1. Call list categories to see what data is available 2. If a relevant category exists, use search by category 3. Otherwise use search documents for a broad search 4. Synthesize the results into a clear answer""" ── Entry point ─────────────────────────────────────────────── if name == " main ": mcp.run stdio mode — standard for Claude Desktop mkdir mcp server touch mcp server/ init .py mcp server/client test.py python mcp server/client test.py import asyncio from fastmcp import Client async def test server : async with Client "mcp server/server.py" as client: List available tools tools = await client.list tools print "=== Available tools ===" for tool in tools: print f" - {tool.name}: {tool.description :50 }..." List resources resources = await client.list resources print "\n=== Available resources ===" for r in resources: print f" - {r.uri}" Call a tool print "\n=== list categories ===" result = await client.call tool "list categories", {} print result print "\n=== search documents ===" result = await client.call tool "search documents", {"query": "ML evaluation metrics", "top k": 2} print result Read a resource print "\n=== db://categories resource ===" content = await client.read resource "db://categories" print content if name == " main ": asyncio.run test server python mcp server/client test.py === Available tools === - search documents: Search all document categories for a given... - search by category: Search within a specific category... - list categories: Return all available categories... === list categories === {'category': 'ML', 'count': 2}, {'category': 'Cloud', 'count': 2}, ... 12 mcp agent.py The biggest difference: tool definitions come from the server, not from hardcoded FunctionDeclaration objects. python 12 mcp agent.py import asyncio from google import genai from google.genai import types from fastmcp import Client from dotenv import load dotenv import os import time load dotenv gemini client = genai.Client api key=os.getenv "GEMINI API KEY" async def run agent task: str : print f"\nTask: {task}" print "=" 60 async with Client "mcp server/server.py" as mcp client: Fetch tool definitions from the server automatically mcp tools = await mcp client.list tools Convert MCP tool definitions to Gemini format gemini tools = types.Tool function declarations= types.FunctionDeclaration name=tool.name, description=tool.description or "", parameters=types.Schema type=types.Type.OBJECT, properties={ name: types.Schema type=types.Type.STRING if schema.get "type" == "string" else types.Type.INTEGER if schema.get "type" == "integer" else types.Type.STRING, description=schema.get "description", "" , for name, schema in tool.inputSchema.get "properties" or {} .items }, required=tool.inputSchema.get "required", , , for tool in mcp tools print f"Loaded {len mcp tools } tools from MCP server" contents = types.Content role="user", parts= types.Part text=task for step in range 8 : print f"\n Step {step + 1} " for attempt in range 5 : try: response = gemini client.models.generate content model="gemini-2.5-flash", contents=contents, config=types.GenerateContentConfig tools= gemini tools , break except Exception as e: if "503" in str e or "429" in str e and attempt < 4: time.sleep attempt + 1 10 else: raise candidates = response.candidates if not candidates or not candidates 0 .content.parts: break part = candidates 0 .content.parts 0 if part.function call: func name = part.function call.name func args = dict part.function call.args print f" → {func name} {func args} " Execute via MCP server instead of calling locally result = await mcp client.call tool func name, func args print f" → {len result if isinstance result, list else result} results" contents.append types.Content role="model", parts= types.Part function call=part.function call contents.append types.Content role="user", parts= types.Part function response=types.FunctionResponse name=func name, response={"result": result}, else: text parts = p.text for p in candidates 0 .content.parts if hasattr p, 'text' and p.text print f"\n Done in {step + 1} steps " return "\n".join text parts return "Max steps reached." async def main : result = await run agent "Check the available categories, then explain ML evaluation metrics in detail." print f"\nFinal answer:\n{result}" if name == " main ": asyncio.run main python 12 mcp agent.py Loaded 3 tools from MCP server Step 1 → list categories {} Step 2 → search by category {'query': 'evaluation metrics', 'category': 'ML'} Done in 3 steps If you have Claude Desktop installed, add this to ~/Library/Application Support/Claude/claude desktop config.json : { "mcpServers": { "pgvector-search": { "command": "/path/to/your/project/.venv/bin/python", "args": "/path/to/your/project/mcp server/server.py" , "env": { "GEMINI API KEY": "AIza...", "DB HOST": "localhost", "DB PORT": "5432", "DB NAME": "vectordb", "DB USER": "postgres", "DB PASSWORD": "password" } } } } Restart Claude Desktop. Now you can type "search the pgvector DB for ML evaluation metrics" directly in Claude's chat interface. Note:Use the full path to your .venv/bin/python , not just python . Claude Desktop doesn't activate virtual environments automatically. Note:Claude Desktop currently only supports stdio transport, not HTTP. Use server.py not server http.py in the config. Tool Use — tools defined in code tools = types.Tool function declarations= types.FunctionDeclaration name="search documents", ... handwritten result = search documents query called directly MCP — tools fetched from server mcp tools = await mcp client.list tools fetched dynamically result = await mcp client.call tool name, args executed on server The tools are identical. The difference is where they live. MCP makes them a shared infrastructure component rather than a per-project implementation. In the final article of this series, we'll deploy the MCP server to Render and the pgvector database to Supabase — making everything accessible from anywhere. Full source code: github.com/qameqame/pgvector-tutorial