cd /news/machine-learning/building-a-cross-user-review-graph-w… Β· home β€Ί topics β€Ί machine-learning β€Ί article
[ARTICLE Β· art-42954] src=dev.to β†— pub= topic=machine-learning verified=true sentiment=↑ positive

Building a cross-user review graph with pgvector on Amazon Aurora

A developer built a universal review graph that aggregates ratings for any item across users using pgvector on Amazon Aurora PostgreSQL Serverless v2. The system matches differently spelled item descriptions via cosine similarity on 1024-dim embeddings, with canonical embeddings updated as a running centroid of linked user items. The solution avoids separate vector and relational stores by handling both matching and aggregation in a single database.

read3 min views1 publishedJun 29, 2026

Every review app is a silo. Yelp reviews places, Amazon reviews its own catalog, Letterboxd reviews film. I wanted to build the opposite: one place where you can log and rate anything β€” a ball-point pen, a burger, a cruise β€” and have your review pool together with everyone else's.

The hard part isn't the form. It's this: when I log "In-N-Out Double-Double" and you log "in n out double double burger," those need to become the same thing so our ratings aggregate. No shared product ID, no barcode, no agreement on spelling. Just two humans describing the same item differently.

That's a similarity problem, and I solved it with ** pgvector running inside Amazon Aurora PostgreSQL (Serverless v2)** β€” the whole app, deployed on Vercel at

Two tables carry the entire idea:

user_items

canonical_items

The matcher's only job is connecting a user_item

to the right canonical_item

. That edge is the product.

Every item carries a 1024-dim embedding. The canonical row's embedding is a vector(1024)

, indexed with HNSW for cosine ANN search:

// db/schema.ts (Drizzle)
embedding: vector("embedding", { dimensions: 1024 }).notNull(),
// ...
index("idx_canon_embedding").using("hnsw", t.embedding.op("vector_cosine_ops")),

I made the canonical embedding ** NOT NULL on purpose**: a canonical entry must always be matchable, so every creation path supplies a vector and the recompute logic preserves it rather than ever averaging to NULL.

On every add, I embed the new item and run an approximate-nearest-neighbour search over the canonical catalog. Drizzle doesn't model vector operators, so the query is a raw sql

template:

SELECT id, name, photo_url, rating_avg, rating_count,
       1 - (embedding <=> :q) AS similarity
FROM canonical_items
WHERE embedding IS NOT NULL
ORDER BY embedding <=> :q
LIMIT 10;

<=>

is pgvector's cosine distance; 1 - distance

is similarity. A high-confidence hit auto-suggests as the default ("we think this is it"); otherwise the user sees the top candidates plus a "None of these β€” create new" escape hatch. Matching is optional and non-blocking: the user_item

is saved instantly with canonical_item_id = NULL

, and the match just fills it in β€” at upload time or from a queue later. The base loop (track my own stuff) never has friction.

Here's the part I'm proud of. A canonical item's embedding is the running centroid of its linked members' embeddings. When you link your burger log to mine, the canonical vector moves to the average of both. The more people log the same thing, the sharper and more representative that vector becomes β€” the catalog improves itself with use.

And because "sort by rating" needs to stay fast at scale, the rating aggregates are denormalized onto the canonical row, with rating_avg

as a stored generated column:

rating_avg real GENERATED ALWAYS AS
  (CASE WHEN rating_count > 0 THEN rating_sum::real / rating_count END) STORED

So the canonical page β€” the payoff screen β€” reads one row to show "4.6β˜… from 11 reviews across strangers," no aggregation query at render time.

I evaluated the three qualifying AWS databases:

pgvector

and PostGIS

extensions, so it can't do the core matching feature.That last point is the whole reason this was buildable by one person on a deadline. The matcher result and the relational aggregation live in the same database, so linking an item and recomputing its canonical's centroid + ratings is a single transaction β€” not a dance between a vector store and an RDBMS.

Next.js 16 (App Router, Server Actions) on Vercel β†’ Aurora PostgreSQL Serverless v2 via pg

/Drizzle, with pgvector

  • PostGIS

  • pg_trgm

. Embeddings come from Amazon Bedrock (more on that in the next post). Swapping local Docker Postgres for Aurora was a one-line DATABASE_URL

change β€” same driver, same SQL.

The result is a universal, cross-user review graph for literally anything, and the magic is one line of SQL: ORDER BY embedding <=> :q

.

Built for the H0 Hackathon ("Hack the Zero Stack with Vercel and AWS Databases"). I created this content for the purposes of entering this hackathon. #H0Hackathon

── more in #machine-learning 4 stories Β· sorted by recency
── more on @amazon aurora 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/building-a-cross-use…] indexed:0 read:3min 2026-06-29 Β· β€”