cd /news/large-language-models/evaluating-the-robustness-of-proof-a… · home topics large-language-models article
[ARTICLE · art-28950] src=arxiv.org ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Evaluating the Robustness of Proof Autoformalization in Lean 4

Researchers at UC Riverside introduced the first robustness study for proof autoformalization in Lean 4, testing seven LLM-based models on perturbed informal proofs. All models showed sensitivity to global paraphrasing and mostly failed to faithfully reflect local perturbations, highlighting a critical gap in current autoformalization systems.

read1 min views1 publishedJun 16, 2026

arXiv:2606.14867v1 Announce Type: new Abstract: Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4. Several works have developed LLM-based models for proof autoformalization. However, existing evaluations have typically focused on translating well-formed informal proofs from curated datasets. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own. We build a benchmark with both perturbations on miniF2F and MATH-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations. Code and data are available via https://github.com/ucr-rai/robust-proof-autoformalization.

── more in #large-language-models 4 stories · sorted by recency
── more on @lean 4 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/evaluating-the-robus…] indexed:0 read:1min 2026-06-16 ·