Building a RAG System from Scratch with pgvector and Gemini — Introduction

wpnews.pro

cd /news/large-language-models/building-a-rag-system-from-scratch-w… · home › topics › large-language-models › article

[ARTICLE · art-42097] src=dev.to ↗ pub=2026-06-27T21:59Z topic=large-language-models verified=true sentiment=↑ positive

Building a RAG System from Scratch with pgvector and Gemini — Introduction

A developer built a RAG (Retrieval-Augmented Generation) system from scratch using pgvector and Google Gemini, implementing a full pipeline for embedding, vector storage, and semantic search. The project covers six steps from core implementation to cloud deployment on Render and Supabase, with source code available on GitHub.

read3 min views1 publishedJun 27, 2026

When you start building LLM-powered applications, one pattern becomes unavoidable: RAG (Retrieval-Augmented Generation).

LLMs only know what they were trained on. Your company's internal documents, the latest spec sheets, project-specific information — none of that exists in the model. To handle data the model doesn't know, you need a system that retrieves relevant knowledge in real time and injects it into the context. That's RAG.

In this guide, we'll implement a RAG system from scratch using pgvector and Gemini, then extend it step by step through Tool Use, AI Agents, MCP, and cloud deployment.

Step 1: Embedding · Vector DB · RAG — core implementation
Step 2: AI Architect perspective — design decisions explained
Step 3: Tool Use — LLM autonomously searches the DB
Step 4: AI Agents — combining multiple tools
Step 5: MCP — exposing tools as a server
Step 6: Cloud deployment — Render × Supabase

Computers can't measure "semantic similarity" from raw text. Embedding converts text into a list of numbers (a vector), and semantically similar words produce numerically similar patterns.

"dog"  → [0.82, 0.75, 0.10, ...]  768 numbers
"cat"  → [0.78, 0.72, 0.12, ...]  ← similar pattern to "dog"
"bank" → [0.08, 0.10, 0.85, ...]  ← completely different

Gemini's embedding model handles this conversion.

A regular DB searches by keyword matching. A vector DB searches by numeric distance — meaning it finds semantically related documents even when the exact words don't match.

-- Regular search (misses if keywords don't match)
SELECT * FROM docs WHERE body LIKE '%F1 score%';

-- Vector search (finds semantically related docs)
SELECT * FROM docs ORDER BY embedding <=> query_vector LIMIT 3;

Search for "how to measure model performance" and it finds "F1 score calculation" — even without matching words. We use pgvector, a PostgreSQL extension, for this.

LLMs are limited to their training data. RAG is a design pattern that retrieves relevant documents and passes them to the LLM as context, enabling the model to answer questions about data it has never seen.

[Plain LLM]  question → answers from training data only
[RAG]        question → search Vector DB → pass results to LLM → grounded answer

Tool	Purpose	Free Tier
Google Gemini API	Embedding generation · answer generation	1,500 requests/day
pgvector (PostgreSQL extension)	Vector storage · search	Unlimited (local)
Docker	Run pgvector locally	Unlimited
Python 3.12	Implementation language	—
Render	Deploy MCP server	Free web service (with sleep)
Supabase	Cloud pgvector	500MB persistent free

This guide focuses on the Applied and Design phases — the first big implementation step after learning the fundamentals (LLM basics, Prompt Engineering, API/SDK usage).

Topic	What we implement
✓	RAG
Full RAG pipeline with pgvector and Gemini
✓	Embedding
Text-to-vector conversion with Gemini Embedding API
✓	Vector DB
Cosine similarity search with pgvector

Let's get started in the next article with environment setup and the first implementation.

Source code: github.com/qameqame/pgvector-tutorial

source & further reading

dev.to — original article Agents Are Learning to Write Their Own SKILL.md Files Inside An AI Agent: Planning, Tool Use, Memory, Constraints, And Verification From "I Can't Click" to a Full Testing Harness: How We Built Playwright for the Terminal

~/api · this article 200

$curl api.wpnews.pro/v1/news/building-a-rag-system-fr…

Read original on dev.to → dev.to/hiroki-kameyama/building-a-rag-system-fro…

mentioned entities

Google Gemini

pgvector

PostgreSQL

Docker

Render

Supabase

Python

GitHub

metadata

slugbuilding-a-rag-system-from-scratch-with-pgvector-and-gemini-introduction

topic#large-language-models

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevAgents Are Learning to Write The…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 27 Jun · #large-language-models

SQL + AI: Real-World Database Solutions You Can Use Today

dev.to · 27 Jun · #large-language-models

We Run 9 AI Agents on 2 CPU Cores and 3.6GB RAM: The Engineering Memoir

dev.to · 27 Jun · #large-language-models

Building an AI-Powered CRM System: A Practical Overview

dev.to · 26 Jun · #large-language-models

How I Built a Personal AI Knowledge Base with Amazon Aurora pgvector and Next.js — AWS H0 Hackathon

── more on @google gemini 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required