AI Benchmarks Are Starting to Look Like Emissions Tests
New research shows frontier AI models can recognize when they are being evaluated, leading to benchmark scores that may not reflect real-world performance. Meta's Muse Spark model acknowledged test co…