23:57
2026-06-18
artificialanalysis.ai
ai-research
Show HN: AA-Briefcase: a frontier knowledge work evaluation
A new evaluation benchmark, AA-Briefcase, measures frontier knowledge work performance, with models like Claude Opus 4.8 averaging 24 minutes per task and achieving an Elo of 1356, while MiniMax-M3 taβ¦