# AI + TMDB: 3 Passes to Match Torrent Posters — Prompt Iteration With Real Numbers

> Source: <https://dev.to/ohugonnot/ai-tmdb-3-passes-to-match-torrent-posters-prompt-iteration-with-real-numbers-bl7>
> Published: 2026-05-30 09:00:04+00:00

[ShareBox](https://www.web-developpeur.com/blog/sharebox-peer-programming-ia) displays shared folders as a Netflix-style grid with TMDB posters. The problem: folder names come from torrents. `Naruto.INTEGRALE.MULTI.VFF.1080p.BluRay.x264-AMB3R`

needs to match "Naruto" on TMDB — not "Naruto Shippuden", not "Naruto the Movie". And `Vol 1`

must definitely not match "Kill Bill: Volume 1".

Basic regex + TMDB search works for 80% of cases. For the remaining 20%, I built a 3-pass AI pipeline (Claude Haiku via CLI) with a cron every 30 minutes. Here's each pass in detail, the exact prompts, and iterations measured on 290 real entries.

The architecture is layered, cheapest to most expensive:

`extract_title_year()`

cleans the name, searches TMDB, takes the first result with a poster. Free, instant, correct ~80% of the time.The first prompt was simple: "extract the proper movie title for a TMDB search." Tested on 290 real names, it produced **72 false skips** — the AI considered "Naruto.INTEGRALE", "Pokemon La Series", "Despicable Me COLLECTION" as non-titles and marked them `skip=true`

.

The fix: explicit rules about what to keep vs. skip, a "when in doubt, skip=false" rule, and instructions to translate known English titles to French. Result: **72 → 41 skips**. 31 improvements, zero regressions.

The verification prompt sent {name, TMDB title} pairs and asked `correct: true/false`

. On 247 entries, it flagged **55 as incorrect**. But 46 were false negatives.

The AI didn't know that `S01 → "Season 1"`

is a correct match — it's a TMDB season poster, not a generic match. Same for all 34 Simpsons seasons, 11 Walking Dead seasons, 4 Batman seasons.

The fix: a "Special cases — do NOT mark as incorrect" section explaining that season folders matched to season titles are correct, and translations/saga names are fine. Result: **55 → 9 incorrects**. All 9 are real problems. Zero false negatives.

When pass 2 detects a false positive and suggests "Naruto" as a better title, we search TMDB. Problem: TMDB returns results by popularity. "Naruto" → Naruto Shippuden (more popular). Taking the first result reproduces the error.

The solution: get 15 TMDB candidates (via multi + tv + movie endpoints), send the full list to AI with the filename for context. The AI picks `{"idx": 1}`

— Naruto (2002), the original series. The word "INTEGRALE" in the filename helps it understand this is the complete series, not a spin-off.

A gotcha: Claude sometimes adds explanations after the JSON, breaking parsing. Fix: extract `{"idx": N}`

via regex instead of full JSON parsing.

Prompt

Before

After

Improvement

Pass 1 (extraction)

72 false skips

41

-43%

Pass 2 (verification)

55 false negatives

9 (all real)

-84%

Pass 3 (candidate pick)

4 parse failures

0

-100%

**Measure before iterating.** Without 290 real entries as a benchmark, I would have iterated blindly. The numbers showed pass 2 v1 had 84% false negatives — impossible to see without real data.

**Edge cases dominate.** 46 out of 55 false negatives came from one pattern: season folders. One line in the prompt ("seasons matched to Season N are CORRECT") eliminated 84% of errors. The 80/20 rule applies to prompts too.

**Parsing matters as much as the prompt.** A perfect prompt is useless if parsing breaks. The AI adds text, code fences, explanations. Regex extraction is more reliable than `json_decode()`

.

**Layered architecture reduces costs.** Free regex handles 80%. AI only runs on the remaining 20%. Pass 3 (the most expensive) only fires when pass 2 detects a problem — 9 times out of 290 entries.

The best prompt isn't the one with the most instructions — it's the one that precisely describes edge cases. "When in doubt, skip=false" and "seasons are CORRECT" are worth more than 20 lines of generic rules.
