00:00
2026-06-11
mindstudio.ai
large-language-models
AI Benchmark Contamination: Why SWEBench Pro Scores Should Come with an Asterisk
Researchers have found that Claude Opus was contaminated on approximately 12% of SWEBench Pro tasks, meaning the model may have encountered those benchmark problems in its training data. The contaminaβ¦