SWEBench

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

00:00

2026-06-11

mindstudio.ai

large-language-models

AI Benchmark Contamination: Why SWEBench Pro Scores Should Come with an Asterisk

Researchers have found that Claude Opus was contaminated on approximately 12% of SWEBench Pro tasks, meaning the model may have encountered those benchmark problems in its training data. The contamina…

// co-occurs with top 3 entities

SWEBench Pro 1 Claude Opus 1 DeepSWE 1

// topics top 5 topics

large language models 1 ai research 1 ai ethics 1 ai agents 1 ai tools 1