An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run

Epoch AI's new MirrorCode benchmark tests AI models on recreating complete programs without access to original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in 14 hours, but all models fail on the most complex tasks. One model ran nonstop for 19 days on a single task, costing $2,600.

Epoch AI's new MirrorCode benchmark tests whether AI models can recreate complete programs without access to the original code. Claude Opus 4.7 leads with a 56 percent solve rate, rebuilding a 16,000-line toolkit in just 14 hours. But every model tested still fails on the most complex tasks. The article An AI model programmed nonstop for 19 days on a single MirrorCode task that cost $2,600 to run https://the-decoder.com/an-ai-model-programmed-nonstop-for-19-days-on-a-single-mirrorcode-task-that-cost-2600-to-run/ appeared first on The Decoder https://the-decoder.com .