01:34
2026-05-29
bsky.app
artificial-intelligence
Gemini 3.5 Flash beats Opus 4.8 on bluffbench
Simon P. Couch re-ran the Bluffbench evaluation against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Gemini 3.5 Flash outperformed Opus 4.8, which showed only a modest improvement over previous Opus modelβ¦