{"slug": "ai-tmdb-3-passes-to-match-torrent-posters-prompt-iteration-with-real-numbers", "title": "AI + TMDB: 3 Passes to Match Torrent Posters — Prompt Iteration With Real Numbers", "summary": "A developer built a 3-pass AI pipeline using Claude Haiku to match torrent folder names to TMDB posters, improving accuracy from 80% to near-perfect on 290 real entries. The system reduced false skips by 43% in pass one, false negatives by 84% in pass two, and eliminated all parse failures in pass three. The key insight: precise edge-case rules like \"seasons matched to Season N are CORRECT\" proved more valuable than generic instructions.", "body_md": "[ShareBox](https://www.web-developpeur.com/blog/sharebox-peer-programming-ia) displays shared folders as a Netflix-style grid with TMDB posters. The problem: folder names come from torrents. `Naruto.INTEGRALE.MULTI.VFF.1080p.BluRay.x264-AMB3R`\n\nneeds to match \"Naruto\" on TMDB — not \"Naruto Shippuden\", not \"Naruto the Movie\". And `Vol 1`\n\nmust definitely not match \"Kill Bill: Volume 1\".\n\nBasic regex + TMDB search works for 80% of cases. For the remaining 20%, I built a 3-pass AI pipeline (Claude Haiku via CLI) with a cron every 30 minutes. Here's each pass in detail, the exact prompts, and iterations measured on 290 real entries.\n\nThe architecture is layered, cheapest to most expensive:\n\n`extract_title_year()`\n\ncleans the name, searches TMDB, takes the first result with a poster. Free, instant, correct ~80% of the time.The first prompt was simple: \"extract the proper movie title for a TMDB search.\" Tested on 290 real names, it produced **72 false skips** — the AI considered \"Naruto.INTEGRALE\", \"Pokemon La Series\", \"Despicable Me COLLECTION\" as non-titles and marked them `skip=true`\n\n.\n\nThe fix: explicit rules about what to keep vs. skip, a \"when in doubt, skip=false\" rule, and instructions to translate known English titles to French. Result: **72 → 41 skips**. 31 improvements, zero regressions.\n\nThe verification prompt sent {name, TMDB title} pairs and asked `correct: true/false`\n\n. On 247 entries, it flagged **55 as incorrect**. But 46 were false negatives.\n\nThe AI didn't know that `S01 → \"Season 1\"`\n\nis a correct match — it's a TMDB season poster, not a generic match. Same for all 34 Simpsons seasons, 11 Walking Dead seasons, 4 Batman seasons.\n\nThe fix: a \"Special cases — do NOT mark as incorrect\" section explaining that season folders matched to season titles are correct, and translations/saga names are fine. Result: **55 → 9 incorrects**. All 9 are real problems. Zero false negatives.\n\nWhen pass 2 detects a false positive and suggests \"Naruto\" as a better title, we search TMDB. Problem: TMDB returns results by popularity. \"Naruto\" → Naruto Shippuden (more popular). Taking the first result reproduces the error.\n\nThe solution: get 15 TMDB candidates (via multi + tv + movie endpoints), send the full list to AI with the filename for context. The AI picks `{\"idx\": 1}`\n\n— Naruto (2002), the original series. The word \"INTEGRALE\" in the filename helps it understand this is the complete series, not a spin-off.\n\nA gotcha: Claude sometimes adds explanations after the JSON, breaking parsing. Fix: extract `{\"idx\": N}`\n\nvia regex instead of full JSON parsing.\n\nPrompt\n\nBefore\n\nAfter\n\nImprovement\n\nPass 1 (extraction)\n\n72 false skips\n\n41\n\n-43%\n\nPass 2 (verification)\n\n55 false negatives\n\n9 (all real)\n\n-84%\n\nPass 3 (candidate pick)\n\n4 parse failures\n\n0\n\n-100%\n\n**Measure before iterating.** Without 290 real entries as a benchmark, I would have iterated blindly. The numbers showed pass 2 v1 had 84% false negatives — impossible to see without real data.\n\n**Edge cases dominate.** 46 out of 55 false negatives came from one pattern: season folders. One line in the prompt (\"seasons matched to Season N are CORRECT\") eliminated 84% of errors. The 80/20 rule applies to prompts too.\n\n**Parsing matters as much as the prompt.** A perfect prompt is useless if parsing breaks. The AI adds text, code fences, explanations. Regex extraction is more reliable than `json_decode()`\n\n.\n\n**Layered architecture reduces costs.** Free regex handles 80%. AI only runs on the remaining 20%. Pass 3 (the most expensive) only fires when pass 2 detects a problem — 9 times out of 290 entries.\n\nThe best prompt isn't the one with the most instructions — it's the one that precisely describes edge cases. \"When in doubt, skip=false\" and \"seasons are CORRECT\" are worth more than 20 lines of generic rules.", "url": "https://wpnews.pro/news/ai-tmdb-3-passes-to-match-torrent-posters-prompt-iteration-with-real-numbers", "canonical_source": "https://dev.to/ohugonnot/ai-tmdb-3-passes-to-match-torrent-posters-prompt-iteration-with-real-numbers-bl7", "published_at": "2026-05-30 09:00:04+00:00", "updated_at": "2026-05-30 09:41:52.047222+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-tools", "ai-products", "natural-language-processing"], "entities": ["TMDB", "Claude Haiku", "ShareBox", "Naruto", "Kill Bill"], "alternates": {"html": "https://wpnews.pro/news/ai-tmdb-3-passes-to-match-torrent-posters-prompt-iteration-with-real-numbers", "markdown": "https://wpnews.pro/news/ai-tmdb-3-passes-to-match-torrent-posters-prompt-iteration-with-real-numbers.md", "text": "https://wpnews.pro/news/ai-tmdb-3-passes-to-match-torrent-posters-prompt-iteration-with-real-numbers.txt", "jsonld": "https://wpnews.pro/news/ai-tmdb-3-passes-to-match-torrent-posters-prompt-iteration-with-real-numbers.jsonld"}}