A Coding Guide to Implement a pgvector-Powered Semantic, Hybrid, Sparse, and Quantized Vector Search System

A new coding tutorial demonstrates how to build a complete pgvector-powered vector search system inside Google Colab, supporting semantic, hybrid, sparse, and quantized search methods. The guide walks through installing PostgreSQL and the pgvector extension, creating embeddings with SentenceTransformers, and implementing HNSW indexes for efficient similarity search. The system enables practical retrieval-augmented generation and recommendation applications using only open-source tools.

In this tutorial, we build a complete pgvector https://github.com/pgvector/pgvector-python playground inside Google Colab and explore how PostgreSQL can work as a powerful vector database for modern AI applications. We start by installing PostgreSQL, compiling the pgvector extension, connecting through Psycopg, and registering vector types for smooth Python integration. Then, we create embeddings with SentenceTransformers, store them in PostgreSQL, build HNSW indexes, and run semantic search, filtered search, distance metric comparisons, half-precision storage, binary quantization, sparse vector search, hybrid retrieval, and vector aggregation. Through this workflow, we learn how pgvector supports practical retrieval-augmented generation, recommendation, similarity search, and hybrid search systems using only open-source tools. python import os import subprocess import sys import time def sh cmd: str, check: bool = True : """Run a shell command, streaming a compact log.""" print f" $ {cmd}" return subprocess.run cmd, shell=True, check=check, stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT print " 0/10 Installing PostgreSQL + building pgvector ≈1–2 min ..." sh "apt-get -qq update" sh "apt-get -qq install -y postgresql postgresql-contrib " "postgresql-server-dev-all build-essential git" if not os.path.exists "/tmp/pgvector" : sh "git clone --depth 1 https://github.com/pgvector/pgvector.git /tmp/pgvector" sh "cd /tmp/pgvector && make && make install" sh "service postgresql start" time.sleep 3 sh """sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'postgres';" """ print " 0/10 Installing Python packages..." sh f"{sys.executable} -m pip install -q pgvector psycopg binary " f"sentence-transformers numpy" We set up the complete PostgreSQL and pgvector environment. We install the required system packages, clone and build pgvector from source, start the PostgreSQL service, and configure the database password. We also install the Python dependencies needed to connect to PostgreSQL and work with vector embeddings. python import numpy as np import psycopg from pgvector import HalfVector, SparseVector from pgvector.psycopg import register vector from sentence transformers import SentenceTransformer print "\n 1/10 Connecting and enabling the 'vector' extension..." conn = psycopg.connect "host=127.0.0.1 port=5432 dbname=postgres user=postgres password=postgres", autocommit=True, conn.execute "CREATE EXTENSION IF NOT EXISTS vector" register vector conn ver = conn.execute "SELECT extversion FROM pg extension WHERE extname='vector'" .fetchone 0 print f" pgvector version: {ver}" print "\n 2/10 Loading embedding model + encoding corpus..." model = SentenceTransformer "all-MiniLM-L6-v2" DIM = model.get sentence embedding dimension corpus = "Octopuses have three hearts and blue blood.", "animals" , "Transformers revolutionized natural language processing.","technology" , "Quantum computers exploit superposition and entanglement.","technology" , "GPUs accelerate deep learning by parallelizing matrix math.","technology" , "Sourdough bread relies on wild yeast and lactobacilli.", "food" , "Dark chocolate contains flavonoid antioxidants.", "food" , "A black hole's gravity is so strong light cannot escape.","space" contents = c for c, in corpus categories = k for , k in corpus embeddings = model.encode contents, normalize embeddings=True conn.execute "DROP TABLE IF EXISTS documents" conn.execute f""" CREATE TABLE documents id bigserial PRIMARY KEY, content text, category text, embedding vector {DIM} """ with conn.cursor as cur: cur.executemany "INSERT INTO documents content, category, embedding VALUES %s, %s, %s ", list zip contents, categories, np.asarray e for e in embeddings , print f" Inserted {len corpus } documents with {DIM}-d embeddings." We connect to PostgreSQL, enable the pgvector extension, and register vector support with Psycopg. We load the SentenceTransformers model, define a small text corpus, generate normalized embeddings, and create a PostgreSQL table for storing documents. We then insert each document with its category and vector representation so that we can perform semantic search later. print "\n 3/10 Building HNSW index and running semantic search..." conn.execute "CREATE INDEX ON documents USING hnsw embedding vector cosine ops " "WITH m = 16, ef construction = 64 " conn.execute "SET hnsw.ef search = 100" def semantic search query: str, k: int = 4 : q = np.asarray model.encode query, normalize embeddings=True return conn.execute "SELECT content, category, embedding <= %s AS distance " "FROM documents ORDER BY distance LIMIT %s", q, k , .fetchall for content, cat, dist in semantic search "animals that are unusually quick" : print f" {dist:.3f} {cat:<10} {content}" print "\n 4/10 Filtered search only category = 'space' ..." q = np.asarray model.encode "objects with extreme gravity", normalize embeddings=True rows = conn.execute "SELECT content, embedding <= %s AS distance " "FROM documents WHERE category = %s ORDER BY distance LIMIT 3", q, "space" , .fetchall for content, dist in rows: print f" {dist:.3f} {content}" print "\n 5/10 Same query under different distance metrics top hit each ..." q = np.asarray model.encode "brewing a hot caffeinated drink", normalize embeddings=True for op, label in "<- ", "L2" , "<= ", "cosine" , "< ", "neg-inner" , "<+ ", "L1" : content, score = conn.execute f"SELECT content, embedding {op} %s AS s FROM documents ORDER BY s LIMIT 1", q, .fetchone print f" {label:<10} {score:+.3f} {content}" We build an HNSW index on the embedding column to enable faster, more efficient vector search. We define a semantic search function that converts a query into an embedding and retrieves the most similar documents using cosine similarity. We also perform metadata-filtered search and compare different pgvector distance operators such as L2, cosine, negative inner product, and L1. print "\n 6/10 Half-precision storage with halfvec..." conn.execute f"ALTER TABLE documents ADD COLUMN IF NOT EXISTS embedding half halfvec {DIM} " conn.execute "UPDATE documents SET embedding half = embedding::halfvec" conn.execute "CREATE INDEX ON documents USING hnsw embedding half halfvec cosine ops " q half = HalfVector model.encode "the galaxy we live in", normalize embeddings=True rows = conn.execute "SELECT content, embedding half <= %s AS d FROM documents ORDER BY d LIMIT 2", q half, , .fetchall for content, d in rows: print f" {d:.3f} {content}" print "\n 7/10 Binary quantization Hamming + exact re-rank..." conn.execute f"CREATE INDEX ON documents " f"USING hnsw binary quantize embedding ::bit {DIM} bit hamming ops " q = np.asarray model.encode "parallel hardware for AI training", normalize embeddings=True rerank sql = f""" SELECT content, candidates.embedding <= % q s AS exact distance FROM SELECT content, embedding FROM documents ORDER BY binary quantize embedding ::bit {DIM} <~ binary quantize % q s ::bit {DIM} LIMIT 8 AS candidates ORDER BY exact distance LIMIT 3 """ for content, d in conn.execute rerank sql, {"q": q} .fetchall : print f" {d:.3f} {content}" print "\n 8/10 Native sparse vectors..." conn.execute "DROP TABLE IF EXISTS sparse items" conn.execute "CREATE TABLE sparse items id bigserial PRIMARY KEY, embedding sparsevec 10 " sparse data = SparseVector {0: 1.0, 3: 2.0, 7: 1.5}, 10 , SparseVector {1: 0.5, 3: 1.0, 9: 3.0}, 10 , SparseVector {0: 0.2, 4: 2.5, 7: 0.8}, 10 , with conn.cursor as cur: cur.executemany "INSERT INTO sparse items embedding VALUES %s ", v, for v in sparse data query sparse = SparseVector {0: 1.0, 7: 1.0}, 10 rows = conn.execute "SELECT id, embedding, embedding < %s AS neg ip " "FROM sparse items ORDER BY neg ip LIMIT 3", query sparse, , .fetchall for id, vec, neg ip in rows: print f" id={ id} inner product={-neg ip:.2f} nnz indices={vec.indices }" We explore advanced pgvector storage and retrieval techniques beyond standard dense vectors. We convert embeddings into half-precision vectors to reduce storage, use binary quantization with Hamming search for fast candidate retrieval, and then re-rank results with full-precision vectors. We also create sparse vectors and query them using inner-product similarity, which is useful for keyword-weighted or SPLADE-style retrieval. print "\n 9/10 Hybrid search vector + full-text via RRF..." user query = "fast animal" qvec = np.asarray model.encode user query, normalize embeddings=True hybrid sql = """ WITH semantic AS SELECT id, RANK OVER ORDER BY embedding <= % qvec s AS rank FROM documents ORDER BY embedding <= % qvec s LIMIT 20 , keyword AS SELECT d.id, RANK OVER ORDER BY ts rank cd to tsvector 'english', d.content , q DESC AS rank FROM documents d, plainto tsquery 'english', % qtext s AS q WHERE to tsvector 'english', d.content @@ q LIMIT 20 SELECT d.content, COALESCE 1.0 / 60 + semantic.rank , 0.0 + COALESCE 1.0 / 60 + keyword.rank , 0.0 AS rrf score FROM documents d LEFT JOIN semantic ON d.id = semantic.id LEFT JOIN keyword ON d.id = keyword.id WHERE semantic.id IS NOT NULL OR keyword.id IS NOT NULL ORDER BY rrf score DESC LIMIT 4 """ for content, score in conn.execute hybrid sql, {"qvec": qvec, "qtext": user query} .fetchall : print f" {score:.5f} {content}" print "\n 10/10 Aggregating vectors with AVG category centroid ..." centroid = conn.execute "SELECT AVG embedding FROM documents WHERE category = %s", "food", .fetchone 0 typical = conn.execute "SELECT content, embedding <= %s AS d FROM documents " "WHERE category = %s ORDER BY d LIMIT 1", np.asarray centroid , "food" , .fetchone print f" Centroid dim = {len centroid }" print f" Most representative 'food' doc: {typical 0 }" print "\n✅ Done. You now have a working pgvector playground inside Colab." print " Try editing corpus , the queries, or swap in your own embedding model." We combine semantic vector search with PostgreSQL full-text search using Reciprocal Rank Fusion. We retrieve results from both semantic and keyword rankings, merge their scores, and produce a stronger hybrid search output. Finally, we compute the average embedding for a category and use it as a centroid to find the most representative document in that group. In conclusion, we have a working pgvector-based retrieval system that runs entirely in Google Colab, without external services or API keys. We used PostgreSQL not just as a traditional relational database, but as a flexible vector search engine that supports dense vectors, half-precision vectors, binary-quantized retrieval, sparse vectors, full-text search, and aggregation. We also observed how metadata filtering, HNSW indexing, Reciprocal Rank Fusion, and centroid-based analysis make pgvector useful for real-world AI search pipelines. Check out the Full Codes with Notebook here. Also, feel free to follow us on and don’t forget to join our Twitter https://x.com/intent/follow?screen name=marktechpost and Subscribe to 150k+ ML SubReddit https://www.reddit.com/r/machinelearningnews/ . Wait are you on telegram? our Newsletter https://www.aidevsignals.com/ now you can join us on telegram as well. https://t.me/machinelearningresearchnews Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us https://forms.gle/wbash1wF6efRj8G58 Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions. - Sana Hassan - Sana Hassan - Sana Hassan - Sana Hassan