{"slug": "building-a-cross-user-review-graph-with-pgvector-on-amazon-aurora", "title": "Building a cross-user review graph with pgvector on Amazon Aurora", "summary": "A developer built a universal review graph that aggregates ratings for any item across users using pgvector on Amazon Aurora PostgreSQL Serverless v2. The system matches differently spelled item descriptions via cosine similarity on 1024-dim embeddings, with canonical embeddings updated as a running centroid of linked user items. The solution avoids separate vector and relational stores by handling both matching and aggregation in a single database.", "body_md": "Every review app is a silo. Yelp reviews places, Amazon reviews its own catalog, Letterboxd reviews film. I wanted to build the opposite: one place where you can log and rate **anything** — a ball-point pen, a burger, a cruise — and have your review pool together with everyone else's.\n\nThe hard part isn't the form. It's this: when *I* log \"In-N-Out Double-Double\" and *you* log \"in n out double double burger,\" those need to become **the same thing** so our ratings aggregate. No shared product ID, no barcode, no agreement on spelling. Just two humans describing the same item differently.\n\nThat's a similarity problem, and I solved it with ** pgvector running inside Amazon Aurora PostgreSQL (Serverless v2)** — the whole app, deployed on Vercel at\n\nTwo tables carry the entire idea:\n\n`user_items`\n\n`canonical_items`\n\nThe matcher's only job is connecting a `user_item`\n\nto the right `canonical_item`\n\n. **That edge is the product.**\n\nEvery item carries a 1024-dim embedding. The canonical row's embedding is a `vector(1024)`\n\n, indexed with HNSW for cosine ANN search:\n\n```\n// db/schema.ts (Drizzle)\nembedding: vector(\"embedding\", { dimensions: 1024 }).notNull(),\n// ...\nindex(\"idx_canon_embedding\").using(\"hnsw\", t.embedding.op(\"vector_cosine_ops\")),\n```\n\nI made the canonical embedding ** NOT NULL on purpose**: a canonical entry must always be matchable, so every creation path supplies a vector and the recompute logic preserves it rather than ever averaging to NULL.\n\nOn every add, I embed the new item and run an approximate-nearest-neighbour search over the canonical catalog. Drizzle doesn't model vector operators, so the query is a raw `sql`\n\ntemplate:\n\n``` js\nSELECT id, name, photo_url, rating_avg, rating_count,\n       1 - (embedding <=> :q) AS similarity\nFROM canonical_items\nWHERE embedding IS NOT NULL\nORDER BY embedding <=> :q\nLIMIT 10;\n```\n\n`<=>`\n\nis pgvector's cosine distance; `1 - distance`\n\nis similarity. A high-confidence hit auto-suggests as the default (\"we think this is it\"); otherwise the user sees the top candidates plus a \"None of these — create new\" escape hatch. **Matching is optional and non-blocking**: the `user_item`\n\nis saved instantly with `canonical_item_id = NULL`\n\n, and the match just fills it in — at upload time or from a queue later. The base loop (track my own stuff) never has friction.\n\nHere's the part I'm proud of. A canonical item's embedding is the **running centroid** of its linked members' embeddings. When you link your burger log to mine, the canonical vector moves to the average of both. The more people log the same thing, the *sharper and more representative* that vector becomes — the catalog improves itself with use.\n\nAnd because \"sort by rating\" needs to stay fast at scale, the rating aggregates are **denormalized onto the canonical row**, with `rating_avg`\n\nas a stored generated column:\n\n```\nrating_avg real GENERATED ALWAYS AS\n  (CASE WHEN rating_count > 0 THEN rating_sum::real / rating_count END) STORED\n```\n\nSo the canonical page — the payoff screen — reads one row to show \"4.6★ from 11 reviews across strangers,\" no aggregation query at render time.\n\nI evaluated the three qualifying AWS databases:\n\n`pgvector`\n\nand `PostGIS`\n\nextensions, so it can't do the core matching feature.That last point is the whole reason this was buildable by one person on a deadline. The matcher result and the relational aggregation live in the same database, so linking an item and recomputing its canonical's centroid + ratings is a single transaction — not a dance between a vector store and an RDBMS.\n\nNext.js 16 (App Router, Server Actions) on **Vercel** → **Aurora PostgreSQL Serverless v2** via `pg`\n\n/Drizzle, with `pgvector`\n\n+ `PostGIS`\n\n+ `pg_trgm`\n\n. Embeddings come from Amazon Bedrock (more on that in the next post). Swapping local Docker Postgres for Aurora was a one-line `DATABASE_URL`\n\nchange — same driver, same SQL.\n\nThe result is a universal, cross-user review graph for literally anything, and the magic is one line of SQL: `ORDER BY embedding <=> :q`\n\n.\n\n*Built for the H0 Hackathon (\"Hack the Zero Stack with Vercel and AWS Databases\"). I created this content for the purposes of entering this hackathon.* **#H0Hackathon**", "url": "https://wpnews.pro/news/building-a-cross-user-review-graph-with-pgvector-on-amazon-aurora", "canonical_source": "https://dev.to/jareddlewis/building-a-cross-user-review-graph-with-pgvector-on-amazon-aurora-7ni", "published_at": "2026-06-29 04:19:03+00:00", "updated_at": "2026-06-29 04:27:11.520387+00:00", "lang": "en", "topics": ["machine-learning", "developer-tools", "ai-infrastructure"], "entities": ["Amazon Aurora", "pgvector", "Vercel", "Drizzle", "Amazon Bedrock", "PostgreSQL"], "alternates": {"html": "https://wpnews.pro/news/building-a-cross-user-review-graph-with-pgvector-on-amazon-aurora", "markdown": "https://wpnews.pro/news/building-a-cross-user-review-graph-with-pgvector-on-amazon-aurora.md", "text": "https://wpnews.pro/news/building-a-cross-user-review-graph-with-pgvector-on-amazon-aurora.txt", "jsonld": "https://wpnews.pro/news/building-a-cross-user-review-graph-with-pgvector-on-amazon-aurora.jsonld"}}