cd /news/artificial-intelligence/ai-tmdb-3-passes-to-match-torrent-po… · home topics artificial-intelligence article
[ARTICLE · art-18426] src=dev.to pub= topic=artificial-intelligence verified=true sentiment=· neutral

AI + TMDB: 3 Passes to Match Torrent Posters — Prompt Iteration With Real Numbers

A developer built a 3-pass AI pipeline using Claude Haiku to match torrent folder names to TMDB posters, improving accuracy from 80% to near-perfect on 290 real entries. The system reduced false skips by 43% in pass one, false negatives by 84% in pass two, and eliminated all parse failures in pass three. The key insight: precise edge-case rules like "seasons matched to Season N are CORRECT" proved more valuable than generic instructions.

read3 min publishedMay 30, 2026

ShareBox displays shared folders as a Netflix-style grid with TMDB posters. The problem: folder names come from torrents. Naruto.INTEGRALE.MULTI.VFF.1080p.BluRay.x264-AMB3R

needs to match "Naruto" on TMDB — not "Naruto Shippuden", not "Naruto the Movie". And Vol 1

must definitely not match "Kill Bill: Volume 1".

Basic regex + TMDB search works for 80% of cases. For the remaining 20%, I built a 3-pass AI pipeline (Claude Haiku via CLI) with a cron every 30 minutes. Here's each pass in detail, the exact prompts, and iterations measured on 290 real entries.

The architecture is layered, cheapest to most expensive:

extract_title_year() cleans the name, searches TMDB, takes the first result with a poster. Free, instant, correct ~80% of the time.The first prompt was simple: "extract the proper movie title for a TMDB search." Tested on 290 real names, it produced 72 false skips — the AI considered "Naruto.INTEGRALE", "Pokemon La Series", "Despicable Me COLLECTION" as non-titles and marked them skip=true

.

The fix: explicit rules about what to keep vs. skip, a "when in doubt, skip=false" rule, and instructions to translate known English titles to French. Result: 72 → 41 skips. 31 improvements, zero regressions.

The verification prompt sent {name, TMDB title} pairs and asked correct: true/false

. On 247 entries, it flagged 55 as incorrect. But 46 were false negatives.

The AI didn't know that S01 → "Season 1"

is a correct match — it's a TMDB season poster, not a generic match. Same for all 34 Simpsons seasons, 11 Walking Dead seasons, 4 Batman seasons.

The fix: a "Special cases — do NOT mark as incorrect" section explaining that season folders matched to season titles are correct, and translations/saga names are fine. Result: 55 → 9 incorrects. All 9 are real problems. Zero false negatives.

When pass 2 detects a false positive and suggests "Naruto" as a better title, we search TMDB. Problem: TMDB returns results by popularity. "Naruto" → Naruto Shippuden (more popular). Taking the first result reproduces the error.

The solution: get 15 TMDB candidates (via multi + tv + movie endpoints), send the full list to AI with the filename for context. The AI picks {"idx": 1}

— Naruto (2002), the original series. The word "INTEGRALE" in the filename helps it understand this is the complete series, not a spin-off.

A gotcha: Claude sometimes adds explanations after the JSON, breaking parsing. Fix: extract {"idx": N}

via regex instead of full JSON parsing.

Prompt

Before

After

Improvement

Pass 1 (extraction) 72 false skips

41

-43%

Pass 2 (verification) 55 false negatives

9 (all real) -84%

Pass 3 (candidate pick) 4 parse failures

0

-100%

Measure before iterating. Without 290 real entries as a benchmark, I would have iterated blindly. The numbers showed pass 2 v1 had 84% false negatives — impossible to see without real data.

Edge cases dominate. 46 out of 55 false negatives came from one pattern: season folders. One line in the prompt ("seasons matched to Season N are CORRECT") eliminated 84% of errors. The 80/20 rule applies to prompts too.

Parsing matters as much as the prompt. A perfect prompt is useless if parsing breaks. The AI adds text, code fences, explanations. Regex extraction is more reliable than json_decode()

.

Layered architecture reduces costs. Free regex handles 80%. AI only runs on the remaining 20%. Pass 3 (the most expensive) only fires when pass 2 detects a problem — 9 times out of 290 entries.

The best prompt isn't the one with the most instructions — it's the one that precisely describes edge cases. "When in doubt, skip=false" and "seasons are CORRECT" are worth more than 20 lines of generic rules.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ai-tmdb-3-passes-to-…] indexed:0 read:3min 2026-05-30 ·