I built a local-first movie recommender with Corrective-RAG (cited explanations, hybrid retrieval, runs entirely on Ollama)

A developer built a local-first movie recommendation system using a Corrective-RAG pipeline that runs entirely on Ollama. The system employs query expansion at ingestion time rather than query time, generating 3-5 pseudo-queries per movie to improve scalability. On an M3 Mac with 36GB RAM, the system achieves approximately 90-second query latency with llama3, dropping to 15-20 seconds with llama3.2:1b.

Hey — sharing a project I've been building for the last few months. It's a movie recommendation system that runs entirely on your laptop using Ollama, with a Corrective-RAG pipeline. Why I built it: existing streaming platforms only know what you watched on them. Netflix can't see my Prime history, none of them know about cinema watches. Wanted one system that learns from all of it. Stack: The interesting design choice was query expansion at INGEST time instead of query time. The enrichment LLM generates 3-5 pseudo-queries per movie and embeds them alongside the plot. Catalogues are bounded; user queries aren't, so paying the LLM cost once per movie scales better than once per query. Latency on M3 / 36GB / Ollama llama3: ~90s/query filter extract + explain dominate . llama3.2:1b drops to ~15-20s. Hosted models ~5-10s. Code + setup: github.com/meetgrewal7793-creator/personal-movie-recommender The 7-stage architecture diagram is in the README. Feedback welcome — especially on the grader prompt calibration, which I had to relax for local-LLM defaults because llama3 graders over-flag results as weak.