cd/entity/bluffbenchΒ· homeβ€Ί entitiesβ€Ί bluffbench
grep -l @bluffbench /news/*.json | wc -l β†’ 1

@bluffbench

mentions 1 type Organization feed RSS
01:34
2026-05-29
bsky.app
artificial-intelligence

Gemini 3.5 Flash beats Opus 4.8 on bluffbench

Simon P. Couch re-ran the Bluffbench evaluation against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Gemini 3.5 Flash outperformed Opus 4.8, which showed only a modest improvement over previous Opus model…

// co-occurs with top 4 entities