Semantic Search with PostgreSQL: Pragmatism Beats Hype - Most of the Time A developer advocates using PostgreSQL with the pgvector extension for semantic search instead of dedicated vector databases like Pinecone or Weaviate. The approach reduces infrastructure complexity by storing embeddings alongside relational data and querying with SQL. Key considerations include choosing a consistent embedding model and chunking documents for better search accuracy. When you start adding semantic search to an application, the obvious options are often Pinecone, Weaviate, Qdrant, Milvus, or another dedicated vector database. That can be the right choice. But many applications already have a PostgreSQL database running. And for a large class of semantic search use cases, that database can do the job directly. The key is pgvector , an open-source PostgreSQL extension that adds vector types and vector similarity search to Postgres. It lets you store embeddings next to your relational data and query them with SQL. The advantage is not only fewer moving parts. It is also architectural: No separate sync pipeline, no second source of truth, and no extra infrastructure until you actually need it. Classic search compares words. Semantic search compares meaning. The query "How do I get over the pass?" can find a text about a mountain pass rather than a school hallway because both the query and the documents are represented as numerical vectors, also called embeddings. The basic flow is: The phrase "same vector space" matters. You cannot freely mix embeddings from different models. Before creating the database schema, choose the embedding model. That choice determines: For example, if you use OpenAI text-embedding-3-small , a vector 1536 column is a common fit. If you use another model, including a local model through Ollama, the dimension may be different. This is not a detail. PostgreSQL will reject vectors with the wrong number of dimensions. If you later change the embedding model, plan to: Embedding model changes are data migrations. Enable the extension once in the database: CREATE EXTENSION IF NOT EXISTS vector; With managed PostgreSQL providers, check whether pgvector is available on your plan and PostgreSQL version. Many providers support it, but you should verify this before designing around it. For small records, one vector per row is fine. For real documents, it is usually better to embed chunks. A long document may cover several topics. One vector for the entire document often becomes too blurry. A practical schema: CREATE TABLE documents id BIGSERIAL PRIMARY KEY, title TEXT NOT NULL, source TEXT, created at TIMESTAMPTZ NOT NULL DEFAULT NOW ; CREATE TABLE document chunks id BIGSERIAL PRIMARY KEY, document id BIGINT NOT NULL REFERENCES documents id ON DELETE CASCADE, chunk index INTEGER NOT NULL, content TEXT NOT NULL, embedding vector 1536 NOT NULL, created at TIMESTAMPTZ NOT NULL DEFAULT NOW , UNIQUE document id, chunk index ; Use vector 1536 only if your embedding model actually returns 1536-dimensional vectors. Otherwise, change the dimension to match the model. Install the packages: dotnet add package Npgsql dotnet add package Pgvector If you use Dapper, also add: dotnet add package Pgvector.Dapper For raw Npgsql, configure the data source with UseVector : js using Npgsql; using Pgvector; var dataSourceBuilder = new NpgsqlDataSourceBuilder connectionString ; dataSourceBuilder.UseVector ; await using var dataSource = dataSourceBuilder.Build ; In production, create the extension and tables through migrations or database provisioning. If you create the extension at runtime, reload PostgreSQL types on the connection before using the new type. With the OpenAI .NET SDK: js using OpenAI.Embeddings; var embeddingClient = new EmbeddingClient "text-embedding-3-small", apiKey ; async Task