Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks.
The article An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run appeared first on The Decoder.