04:00
2026-06-03
arxiv.org
natural-language-processing
Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling
A systematic human audit of the NL-to-FOL benchmarks FOLIO and MALLS found that approximately 39% and 36% of entries, respectively, contain incorrect FOL formalizations, with additional errors in ambi…