00:00
2026-06-19
code.visualstudio.com
ai-agents
What 50,000 Runs of a 5-Line Eval Taught Us
The VS Code Eval Team ran a five-line evaluation task 50,974 times across 30 models over six months, revealing significant differences in how models handle a simple file-writing request. Model-A alwayβ¦