15:56
2026-06-03
amazon.science
artificial-intelligence
Ground truth is a process, not a dataset
Amazon's AGI group found that human experts achieved only 60.8% accuracy when verifying claims from AI-generated research reports, revealing that static ground truth datasets are insufficient for evalβ¦