22:10
2026-06-20
opensource.posit.co
large-language-models
Bluffbench is near saturation: LLMs can interpret counterintuitive plots
Bluffbench, an evaluation for LLM plot interpretation, is nearing saturation as models like Fable 5 achieve high scores on the hardest 'mocked' cases, though human-level performance remains elusive. Tโฆ