Evaluating the Robustness of Proof Autoformalization in Lean 4

wpnews.pro

cd /news/large-language-models/evaluating-the-robustness-of-proof-a… · home › topics › large-language-models › article

[ARTICLE · art-28950] src=arxiv.org ↗ pub=2026-06-16T04:00Z topic=large-language-models verified=true sentiment=· neutral

Evaluating the Robustness of Proof Autoformalization in Lean 4

Researchers at UC Riverside introduced the first robustness study for proof autoformalization in Lean 4, testing seven LLM-based models on perturbed informal proofs. All models showed sensitivity to global paraphrasing and mostly failed to faithfully reflect local perturbations, highlighting a critical gap in current autoformalization systems.

read1 min views30 publishedJun 16, 2026

arXiv:2606.14867v1 Announce Type: new Abstract: Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4. Several works have developed LLM-based models for proof autoformalization. However, existing evaluations have typically focused on translating well-formed informal proofs from curated datasets. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own. We build a benchmark with both perturbations on miniF2F and MATH-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations. Code and data are available via https://github.com/ucr-rai/robust-proof-autoformalization.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/evaluating-the-robustnes…

Read original on arxiv.org → arxiv.org/abs/2606.14867

mentioned entities

Lean 4

UC Riverside

miniF2F

MATH-500

metadata

slugevaluating-the-robustness-of-proof-autoformalization-in-lean-4

topic#large-language-models

secondary2 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevShould you buy a Mac mini now or…

next →Could a diamond wafer as wide as…

── more in #large-language-models 4 stories · sorted by recency

sourcefeed.dev · 2 Aug · #large-language-models

Astra's Real Breakthrough Is the Lean Receipts

mlq.ai · 2 Aug · #large-language-models

OpenAI publishes ten claimed math advances, with formal peer review still pending

dev.to · 2 Aug · #large-language-models

Wiring SlopScan into Claude Code — A Skill, a Hook, and a Bug I Almost Shipped

dev.to · 2 Aug · #large-language-models

A.I Foundational Guideline for Developers

── more on @lean 4 3 stories trending now

wpnews · 1 Aug · #ai-products

OpenAI Atlas Shuts Down August 9: Migration Guide

wpnews · 1 Aug · #ai-agents

Quality Isn't Accidental — Maker/Checker Separation and Automated Validation

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required