{"slug": "gemini-3-5-flash-beats-opus-4-8-on-bluffbench", "title": "Gemini 3.5 Flash beats Opus 4.8 on bluffbench", "summary": "Simon P. Couch re-ran the Bluffbench evaluation against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Gemini 3.5 Flash outperformed Opus 4.8, which showed only a modest improvement over previous Opus models. The results position Gemini 3.5 Flash as the standout model in the latest benchmark comparison.", "body_md": "This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.\n\n### Post\n\nSimon P. Couch\n\nsimonpcouch.com\n\ndid:plc:bspwzx2ytje3gbvikujf2gl5\n\nRe-ran this eval against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Opus 4.8 is a modest improvement over the previously tested Opus models, but Gemini 3.5 Flash is the real stand-out!\nsimonpcouch.github.io/bluffbench/\n[contains quote post or other embedded content]\n\n2026-05-28T19:41:06.976Z", "url": "https://wpnews.pro/news/gemini-3-5-flash-beats-opus-4-8-on-bluffbench", "canonical_source": "https://bsky.app/profile/simonpcouch.com/post/3mmwroep6lc2y", "published_at": "2026-05-29 01:34:09+00:00", "updated_at": "2026-05-29 01:45:52.922321+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-research"], "entities": ["Simon P. Couch", "Opus 4.8", "Gemini 3.5 Flash", "GPT 5.5", "bluffbench"], "alternates": {"html": "https://wpnews.pro/news/gemini-3-5-flash-beats-opus-4-8-on-bluffbench", "markdown": "https://wpnews.pro/news/gemini-3-5-flash-beats-opus-4-8-on-bluffbench.md", "text": "https://wpnews.pro/news/gemini-3-5-flash-beats-opus-4-8-on-bluffbench.txt", "jsonld": "https://wpnews.pro/news/gemini-3-5-flash-beats-opus-4-8-on-bluffbench.jsonld"}}