Bluffbench

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

22:10

2026-06-20

opensource.posit.co

large-language-models

Bluffbench is near saturation: LLMs can interpret counterintuitive plots

Bluffbench, an evaluation for LLM plot interpretation, is nearing saturation as models like Fable 5 achieve high scores on the hardest 'mocked' cases, though human-level performance remains elusive. T…

// co-occurs with top 4 entities

Gemini 1 Opus 1 Fable 1 Posit 1