Hey — sharing a project I've been building for the last
few months. It's a movie recommendation system that runs entirely on
your laptop using Ollama, with a Corrective-RAG pipeline.
Why I built it: existing streaming platforms only know what you
watched on them. Netflix can't see my Prime history, none of them know
about cinema watches. Wanted one system that learns from all of it.
Stack:
The interesting design choice was query expansion at INGEST time instead
of query time. The enrichment LLM generates 3-5 pseudo-queries per movie
and embeds them alongside the plot. Catalogues are bounded; user queries
aren't, so paying the LLM cost once per movie scales better than once
per query.
Latency on M3 / 36GB / Ollama llama3: ~90s/query (filter_extract +
explain dominate). llama3.2:1b drops to ~15-20s. Hosted models ~5-10s.
Code + setup: github.com/meetgrewal7793-creator/personal-movie-recommender
The 7-stage architecture diagram is in the README. Feedback welcome —
especially on the grader prompt calibration, which I had to relax for
local-LLM defaults because llama3 graders over-flag results as weak.