17:13
2026-05-20
dev.to
large-language-models
Why Code Golfing is the Ultimate Test for Multimodal LLMs (And a New Benchmark to Prove It)
ClawBattle, a new open-source benchmark that tests multimodal LLMs on code golfing tasks, which require both visual and textual understanding. It claims the benchmark avoids data contamination by usinβ¦