SWEBench-Verified

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

06:12

2026-06-09

latent.space

artificial-intelligence

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

Cognition introduced FrontierCode, a new benchmark that evaluates code on mergeability rather than just unit-test passing, with tasks built by open-source maintainers requiring over 40 hours each. The…

// co-occurs with top 7 entities

Apple 1 Cognition 1 FrontierCode 1 FrontierMath 1 SWEBench Pro 1 OpenAI 1 METR 1

// topics top 5 topics

artificial intelligence 1 large language models 1 ai research 1 ai products 1 ai tools 1