Building a cross-user review graph with pgvector on Amazon Aurora

A developer built a universal review graph that aggregates ratings for any item across users using pgvector on Amazon Aurora PostgreSQL Serverless v2. The system matches differently spelled item descriptions via cosine similarity on 1024-dim embeddings, with canonical embeddings updated as a running centroid of linked user items. The solution avoids separate vector and relational stores by handling both matching and aggregation in a single database.

Every review app is a silo. Yelp reviews places, Amazon reviews its own catalog, Letterboxd reviews film. I wanted to build the opposite: one place where you can log and rate anything — a ball-point pen, a burger, a cruise — and have your review pool together with everyone else's. The hard part isn't the form. It's this: when I log "In-N-Out Double-Double" and you log "in n out double double burger," those need to become the same thing so our ratings aggregate. No shared product ID, no barcode, no agreement on spelling. Just two humans describing the same item differently. That's a similarity problem, and I solved it with pgvector running inside Amazon Aurora PostgreSQL Serverless v2 — the whole app, deployed on Vercel at Two tables carry the entire idea: user items canonical items The matcher's only job is connecting a user item to the right canonical item . That edge is the product. Every item carries a 1024-dim embedding. The canonical row's embedding is a vector 1024 , indexed with HNSW for cosine ANN search: // db/schema.ts Drizzle embedding: vector "embedding", { dimensions: 1024 } .notNull , // ... index "idx canon embedding" .using "hnsw", t.embedding.op "vector cosine ops" , I made the canonical embedding NOT NULL on purpose : a canonical entry must always be matchable, so every creation path supplies a vector and the recompute logic preserves it rather than ever averaging to NULL. On every add, I embed the new item and run an approximate-nearest-neighbour search over the canonical catalog. Drizzle doesn't model vector operators, so the query is a raw sql template: js SELECT id, name, photo url, rating avg, rating count, 1 - embedding <= :q AS similarity FROM canonical items WHERE embedding IS NOT NULL ORDER BY embedding <= :q LIMIT 10; <= is pgvector's cosine distance; 1 - distance is similarity. A high-confidence hit auto-suggests as the default "we think this is it" ; otherwise the user sees the top candidates plus a "None of these — create new" escape hatch. Matching is optional and non-blocking : the user item is saved instantly with canonical item id = NULL , and the match just fills it in — at upload time or from a queue later. The base loop track my own stuff never has friction. Here's the part I'm proud of. A canonical item's embedding is the running centroid of its linked members' embeddings. When you link your burger log to mine, the canonical vector moves to the average of both. The more people log the same thing, the sharper and more representative that vector becomes — the catalog improves itself with use. And because "sort by rating" needs to stay fast at scale, the rating aggregates are denormalized onto the canonical row , with rating avg as a stored generated column: rating avg real GENERATED ALWAYS AS CASE WHEN rating count 0 THEN rating sum::real / rating count END STORED So the canonical page — the payoff screen — reads one row to show "4.6★ from 11 reviews across strangers," no aggregation query at render time. I evaluated the three qualifying AWS databases: pgvector and PostGIS extensions, so it can't do the core matching feature.That last point is the whole reason this was buildable by one person on a deadline. The matcher result and the relational aggregation live in the same database, so linking an item and recomputing its canonical's centroid + ratings is a single transaction — not a dance between a vector store and an RDBMS. Next.js 16 App Router, Server Actions on Vercel → Aurora PostgreSQL Serverless v2 via pg /Drizzle, with pgvector + PostGIS + pg trgm . Embeddings come from Amazon Bedrock more on that in the next post . Swapping local Docker Postgres for Aurora was a one-line DATABASE URL change — same driver, same SQL. The result is a universal, cross-user review graph for literally anything, and the magic is one line of SQL: ORDER BY embedding <= :q . Built for the H0 Hackathon "Hack the Zero Stack with Vercel and AWS Databases" . I created this content for the purposes of entering this hackathon. H0Hackathon