Every review app is a silo. Yelp reviews places, Amazon reviews its own catalog, Letterboxd reviews film. I wanted to build the opposite: one place where you can log and rate anything β a ball-point pen, a burger, a cruise β and have your review pool together with everyone else's.
The hard part isn't the form. It's this: when I log "In-N-Out Double-Double" and you log "in n out double double burger," those need to become the same thing so our ratings aggregate. No shared product ID, no barcode, no agreement on spelling. Just two humans describing the same item differently.
That's a similarity problem, and I solved it with ** pgvector running inside Amazon Aurora PostgreSQL (Serverless v2)** β the whole app, deployed on Vercel at
Two tables carry the entire idea:
user_items
canonical_items
The matcher's only job is connecting a user_item
to the right canonical_item
. That edge is the product.
Every item carries a 1024-dim embedding. The canonical row's embedding is a vector(1024)
, indexed with HNSW for cosine ANN search:
// db/schema.ts (Drizzle)
embedding: vector("embedding", { dimensions: 1024 }).notNull(),
// ...
index("idx_canon_embedding").using("hnsw", t.embedding.op("vector_cosine_ops")),
I made the canonical embedding ** NOT NULL on purpose**: a canonical entry must always be matchable, so every creation path supplies a vector and the recompute logic preserves it rather than ever averaging to NULL.
On every add, I embed the new item and run an approximate-nearest-neighbour search over the canonical catalog. Drizzle doesn't model vector operators, so the query is a raw sql
template:
SELECT id, name, photo_url, rating_avg, rating_count,
1 - (embedding <=> :q) AS similarity
FROM canonical_items
WHERE embedding IS NOT NULL
ORDER BY embedding <=> :q
LIMIT 10;
<=>
is pgvector's cosine distance; 1 - distance
is similarity. A high-confidence hit auto-suggests as the default ("we think this is it"); otherwise the user sees the top candidates plus a "None of these β create new" escape hatch. Matching is optional and non-blocking: the user_item
is saved instantly with canonical_item_id = NULL
, and the match just fills it in β at upload time or from a queue later. The base loop (track my own stuff) never has friction.
Here's the part I'm proud of. A canonical item's embedding is the running centroid of its linked members' embeddings. When you link your burger log to mine, the canonical vector moves to the average of both. The more people log the same thing, the sharper and more representative that vector becomes β the catalog improves itself with use.
And because "sort by rating" needs to stay fast at scale, the rating aggregates are denormalized onto the canonical row, with rating_avg
as a stored generated column:
rating_avg real GENERATED ALWAYS AS
(CASE WHEN rating_count > 0 THEN rating_sum::real / rating_count END) STORED
So the canonical page β the payoff screen β reads one row to show "4.6β from 11 reviews across strangers," no aggregation query at render time.
I evaluated the three qualifying AWS databases:
pgvector
and PostGIS
extensions, so it can't do the core matching feature.That last point is the whole reason this was buildable by one person on a deadline. The matcher result and the relational aggregation live in the same database, so linking an item and recomputing its canonical's centroid + ratings is a single transaction β not a dance between a vector store and an RDBMS.
Next.js 16 (App Router, Server Actions) on Vercel β Aurora PostgreSQL Serverless v2 via pg
/Drizzle, with pgvector
-
PostGIS -
pg_trgm
. Embeddings come from Amazon Bedrock (more on that in the next post). Swapping local Docker Postgres for Aurora was a one-line DATABASE_URL
change β same driver, same SQL.
The result is a universal, cross-user review graph for literally anything, and the magic is one line of SQL: ORDER BY embedding <=> :q
.
Built for the H0 Hackathon ("Hack the Zero Stack with Vercel and AWS Databases"). I created this content for the purposes of entering this hackathon. #H0Hackathon