18:25
2026-06-14
github.com
ai-safety
"A benchmark for catching when code doesn't do what its documentation claims"
A new open-source benchmark, Truth Benchmark, automatically detects when code does not match its documentation claims. The project provides a dataset of 52 labeled examples across multiple programmingβ¦