# Building a RAG System from Scratch with pgvector and Gemini — Introduction

> Source: <https://dev.to/hiroki-kameyama/building-a-rag-system-from-scratch-with-pgvector-and-gemini-introduction-c8i>
> Published: 2026-06-27 21:59:18+00:00

When you start building LLM-powered applications, one pattern becomes unavoidable: **RAG (Retrieval-Augmented Generation)**.

LLMs only know what they were trained on. Your company's internal documents, the latest spec sheets, project-specific information — none of that exists in the model. To handle data the model doesn't know, you need a system that retrieves relevant knowledge in real time and injects it into the context. That's RAG.

In this guide, we'll implement a RAG system from scratch using pgvector and Gemini, then extend it step by step through Tool Use, AI Agents, MCP, and cloud deployment.

```
Step 1: Embedding · Vector DB · RAG — core implementation
Step 2: AI Architect perspective — design decisions explained
Step 3: Tool Use — LLM autonomously searches the DB
Step 4: AI Agents — combining multiple tools
Step 5: MCP — exposing tools as a server
Step 6: Cloud deployment — Render × Supabase
```

Computers can't measure "semantic similarity" from raw text. Embedding converts text into a list of numbers (a vector), and semantically similar words produce numerically similar patterns.

```
"dog"  → [0.82, 0.75, 0.10, ...]  768 numbers
"cat"  → [0.78, 0.72, 0.12, ...]  ← similar pattern to "dog"
"bank" → [0.08, 0.10, 0.85, ...]  ← completely different
```

Gemini's embedding model handles this conversion.

A regular DB searches by keyword matching. A vector DB searches by **numeric distance** — meaning it finds semantically related documents even when the exact words don't match.

```
-- Regular search (misses if keywords don't match)
SELECT * FROM docs WHERE body LIKE '%F1 score%';

-- Vector search (finds semantically related docs)
SELECT * FROM docs ORDER BY embedding <=> query_vector LIMIT 3;
```

Search for "how to measure model performance" and it finds "F1 score calculation" — even without matching words. We use **pgvector**, a PostgreSQL extension, for this.

LLMs are limited to their training data. RAG is a design pattern that retrieves relevant documents and passes them to the LLM as context, enabling the model to answer questions about data it has never seen.

```
[Plain LLM]  question → answers from training data only
[RAG]        question → search Vector DB → pass results to LLM → grounded answer
```

| Tool | Purpose | Free Tier |
|---|---|---|
| Google Gemini API | Embedding generation · answer generation | 1,500 requests/day |
| pgvector (PostgreSQL extension) | Vector storage · search | Unlimited (local) |
| Docker | Run pgvector locally | Unlimited |
| Python 3.12 | Implementation language | — |
| Render | Deploy MCP server | Free web service (with sleep) |
| Supabase | Cloud pgvector | 500MB persistent free |

This guide focuses on the **Applied** and **Design** phases — the first big implementation step after learning the fundamentals (LLM basics, Prompt Engineering, API/SDK usage).

| Topic | What we implement | |
|---|---|---|
| ✓ | RAG |
Full RAG pipeline with pgvector and Gemini |
| ✓ | Embedding |
Text-to-vector conversion with Gemini Embedding API |
| ✓ | Vector DB |
Cosine similarity search with pgvector |

Let's get started in the next article with environment setup and the first implementation.

*Source code: github.com/qameqame/pgvector-tutorial*