FunnyBench – Can AI Models Tell Funny Jokes?

wpnews.pro

cd /news/large-language-models/funnybench-can-ai-models-tell-funny-… · home › topics › large-language-models › article

[ARTICLE · art-35174] src=funnybench.lol ↗ pub=2026-06-20T22:43Z topic=large-language-models verified=true sentiment=· neutral

FunnyBench – Can AI Models Tell Funny Jokes?

A new benchmark called FunnyBench tests AI models' ability to tell funny jokes by asking each model to generate ten jokes and letting users vote on their humor. The live leaderboard uses a Bayesian score to rank models based on user votes, with the model revealed after voting. The benchmark aims to evaluate humor generation in AI, a challenging aspect of natural language processing.

read1 min views1 publishedJun 20, 2026

FunnyBench FunnyBench asks a simple question: can AI models tell funny jokes? Each model was given the same prompt — “tell me a joke” — ten times. Read a joke and decide if it’s funny or not. Your votes drive a live leaderboard. We asked each model multiple times to encourage variety, but some still repeated the same joke.

The model is revealed after you vote.

Current leaders #

Funniest joke so far

Funniest model so far

No votes yet.

Live leaderboard #

|---|

Details #

Jokes were generated through OpenRouter from its model catalog using the exact prompt “tell me a joke”. Generation used temperature 1 where supported, a 120 second timeout, provider fallback disabled, required parameters enabled, and the returned model, provider, and text were stored. Token counts and cost are stored internally but not displayed, to reduce noise. The leaderboard uses a Bayesian score: each model starts near the overall average and moves as votes come in, which makes early rankings less jumpy than a raw funny percentage. It also shows both the model requested from OpenRouter and the returned model that actually ran, so the benchmark is explicit about what was tested. For reasoning models, the lowest available reasoning setting was used; reasoning traces were intentionally not captured because they are not part of the joke shown to voters. The run excluded models not primarily meant for text, OpenRouter/router/front aliases, search or custom-tool variants, floating “latest” aliases, unavailable-price models, duplicate free aliases, invalid empty or oversized outputs, and any model that failed five calls in a row. The published set keeps ten valid jokes per retained model.

source & further reading

funnybench.lol — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/funnybench-can-ai-models…

Read original on funnybench.lol → funnybench.lol