Building a RAG System from Scratch — Cloud Deployment with Render and Supabase

wpnews.pro

In the previous article, we built an MCP server that any LLM client can connect to locally. In this article, we'll deploy it to the cloud — making it accessible from anywhere.

Before: localhost:8000 → pgvector (Docker)
After:  https://your-app.onrender.com/mcp → Supabase (pgvector)

Both services are free to start with no credit card required.

Service	Role	Free tier
Render
Host the MCP HTTP server	Persistent web service (sleeps after 15 min)
Supabase
Managed PostgreSQL + pgvector	500MB, persistent

Open SQL Editor in the Supabase dashboard and run:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS documents (
    id          SERIAL PRIMARY KEY,
    title       TEXT NOT NULL,
    body        TEXT NOT NULL,
    category    TEXT,
    created_at  TIMESTAMP DEFAULT NOW(),
    embedding   vector(768)
);

CREATE INDEX IF NOT EXISTS docs_embedding_idx
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

CREATE INDEX ON documents (category);

Click the Connect button at the top of the dashboard (not Settings → Database — the UI has changed).

Select the Connection pooling tab and copy the Transaction mode URI. It looks like:

postgresql://postgres.xxxx:password@aws-0-ap-northeast-1.pooler.supabase.com:6543/postgres

Why port 6543?The standard port 5432 uses IPv6, which Render doesn't support. The Connection Pooler (port 6543) uses IPv4 and is the correct choice for cloud-to-cloud connections.

Add the Supabase URL to your .env

:

DATABASE_URL=postgresql://postgres.xxxx:password@aws-0-ap-northeast-1.pooler.supabase.com:6543/postgres

Then run the migration:

import psycopg2
from dotenv import load_dotenv
import os

load_dotenv()

local_conn = psycopg2.connect(
    host=os.getenv("DB_HOST"), port=os.getenv("DB_PORT"),
    dbname=os.getenv("DB_NAME"), user=os.getenv("DB_USER"),
    password=os.getenv("DB_PASSWORD"),
)
local_cur = local_conn.cursor()

supa_conn = psycopg2.connect(os.getenv("DATABASE_URL"), sslmode="require")
supa_cur = supa_conn.cursor()

local_cur.execute("SELECT title, body, category, embedding FROM documents;")
rows = local_cur.fetchall()
print(f"Migrating {len(rows)} documents...")

for row in rows:
    title, body, category, embedding = row
    supa_cur.execute("""
        INSERT INTO documents (title, body, category, embedding)
        VALUES (%s, %s, %s, %s) ON CONFLICT DO NOTHING;
    """, (title, body, category, embedding))

supa_conn.commit()

supa_cur.execute("SELECT COUNT(*) FROM documents;")
count = supa_cur.fetchone()[0]
print(f"Done. Documents in Supabase: {count}")

local_conn.close()
supa_conn.close()
python migrate_to_supabase.py

Create mcp_server/server_render.py

. The only differences from server.py

:

DATABASE_URL

env varPORT

env var (Render sets this automatically)streamable-http

instead of stdio

import psycopg2
from google import genai
from google.genai import types as genai_types
from fastmcp import FastMCP
from dotenv import load_dotenv
import os

load_dotenv()

mcp = FastMCP(
    name="pgvector-search",
    instructions="Document search server using pgvector.",
)

gemini_client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

DATABASE_URL = os.getenv("DATABASE_URL")
if DATABASE_URL:
    conn = psycopg2.connect(DATABASE_URL, sslmode="require")
else:
    conn = psycopg2.connect(
        host=os.getenv("DB_HOST", "localhost"),
        port=os.getenv("DB_PORT", "5432"),
        dbname=os.getenv("DB_NAME", "vectordb"),
        user=os.getenv("DB_USER", "postgres"),
        password=os.getenv("DB_PASSWORD", "password"),
    )

cur = conn.cursor()

def get_embedding(text: str) -> list[float]:
    result = gemini_client.models.embed_content(
        model="gemini-embedding-001",
        contents=text,
        config=genai_types.EmbedContentConfig(
            task_type="RETRIEVAL_QUERY",
            output_dimensionality=768,
        ),
    )
    return result.embeddings[0].values

@mcp.tool
def search_documents(query: str, top_k: int = 3) -> list[dict]:
    """Search all document categories for a given query."""
    q = get_embedding(query)
    cur.execute("""
        SELECT title, body, category,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents ORDER BY embedding <=> %s::vector LIMIT %s;
    """, (q, q, top_k))
    return [
        {"title": r[0], "body": r[1], "category": r[2], "similarity": round(r[3], 4)}
        for r in cur.fetchall()
    ]

@mcp.tool
def search_by_category(query: str, category: str, top_k: int = 3) -> list[dict]:
    """Search within a specific category (ML, Python, or Cloud)."""
    q = get_embedding(query)
    cur.execute("""
        SELECT title, body, category,
               1 - (embedding <=> %s::vector) AS similarity
        FROM documents WHERE category = %s
        ORDER BY embedding <=> %s::vector LIMIT %s;
    """, (q, category, q, top_k))
    return [
        {"title": r[0], "body": r[1], "category": r[2], "similarity": round(r[3], 4)}
        for r in cur.fetchall()
    ]

@mcp.tool
def list_categories() -> list[dict]:
    """Return all available categories and document counts."""
    cur.execute("""
        SELECT category, COUNT(*) as count
        FROM documents GROUP BY category ORDER BY count DESC;
    """)
    return [{"category": r[0], "count": r[1]} for r in cur.fetchall()]

if __name__ == "__main__":
    port = int(os.getenv("PORT", 8000))  # Render sets PORT automatically
    mcp.run(
        transport="streamable-http",
        host="0.0.0.0",
        port=port,
    )

Push to GitHub:

git add .
git commit -m "feat: add Render deployment server"
git push origin main

Field	Value
Name	`pgvector-mcp-server`
Runtime	Python 3
Build Command	`pip install -r requirements.txt`
Start Command	`python mcp_server/server_render.py`
Instance Type	Free

Key	Value
`GEMINI_API_KEY`
`AIza...`
`DATABASE_URL`
The Connection Pooler URI from Supabase (port 6543)

Once deployed, visit:

https://pgvector-mcp-server.onrender.com/mcp

You'll see:

{"jsonrpc":"2.0","id":"server-error","error":{"code":-32600,"message":"Not Acceptable: Client must accept text/event-stream"}}

This is correct — it means the server is running. The error appears because browsers aren't MCP clients. Your agent will connect without issues.

First request is slow:Render's free tier sleeps after 15 minutes of inactivity. The first request after sleep takes 30–60 seconds to wake up. UseUptimeRobot to ping every 5 minutes and prevent sleep.

Update 13_mcp_http_agent.py

with the Render URL:

MCP_SERVER_URL = "https://pgvector-mcp-server.onrender.com/mcp"

async def run_agent(task: str):
    async with Client(MCP_SERVER_URL) as mcp_client:  # URL instead of file path
        mcp_tools = await mcp_client.list_tools()
        print(f"Loaded {len(mcp_tools)} tools from remote MCP server")
python 13_mcp_http_agent.py

The agent is now querying pgvector on Supabase through an MCP server running on Render — entirely in the cloud.

Error	Cause	Fix
`Network is unreachable` (IPv6)
Using port 5432	Use Connection Pooler URL (port 6543)
`SSL connection required`
Missing sslmode	Add `sslmode="require"`

`ModuleNotFoundError`
Missing package	Run `pip freeze > requirements.txt` and push
Slow first response	Render sleep	Use UptimeRobot to keep alive
`Not Found` at `/`

Wrong URL	Add `/mcp` to the URL

Local development:
  Python agent → stdio → mcp_server/server.py → Docker pgvector

Cloud deployment:
  Python agent → HTTPS → Render (server_render.py) → Supabase pgvector

The codebase is identical. The infrastructure changed around it.

In the final article, we'll wrap up the series with a summary of all design decisions and point to Vol.2 — where we cover Evals, Observability, Security, MLOps, Fine-tuning, Multi-Agent, and Governance.

Full source code: github.com/qameqame/pgvector-tutorial

source & further reading

dev.to — original article Building a RAG System from Scratch — Wrap-up and What Comes Next Building a RAG System from Scratch — Tool Use: Let the LLM Search Autonomously Building a RAG System from Scratch — Design Decisions Explained

Building a RAG System from Scratch — Cloud Deployment with Render and Supabase

Run your AI side-project on zahid.host