Gemini 3.5 Flash beats Opus 4.8 on bluffbench Simon P. Couch re-ran the Bluffbench evaluation against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Gemini 3.5 Flash outperformed Opus 4.8, which showed only a modest improvement over previous Opus models. The results position Gemini 3.5 Flash as the standout model in the latest benchmark comparison. This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is. Post Simon P. Couch simonpcouch.com did:plc:bspwzx2ytje3gbvikujf2gl5 Re-ran this eval against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Opus 4.8 is a modest improvement over the previously tested Opus models, but Gemini 3.5 Flash is the real stand-out simonpcouch.github.io/bluffbench/ contains quote post or other embedded content 2026-05-28T19:41:06.976Z