cd /news/artificial-intelligence/leanmarathon-toward-reliable-ai-co-m… · home topics artificial-intelligence article
[ARTICLE · art-23125] src=arxiv.org pub= topic=artificial-intelligence verified=true sentiment=↑ positive

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

Researchers have developed LeanMarathon, a multi-agent system that reliably autoformalizes complex mathematical proofs into the Lean theorem prover by breaking long-horizon tasks into parallel, recoverable transactions. The system successfully formalized all seven target theorems from two recent research papers on Erdős problems without errors, proving 258 lemmas and theorems across three autonomous runs. This demonstrates that durable coordination frameworks, not just stronger provers, are essential for enabling AI to serve as reliable co-mathematicians in long mathematical developments.

read1 min publishedJun 6, 2026

arXiv:2606.05400v1 Announce Type: new Abstract: Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions. We evaluate LeanMarathon on two recent research papers spanning four Erd\H{o}s problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at https://github.com/YuanheZ/LeanMarathon.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/leanmarathon-toward-…] indexed:0 read:1min 2026-06-06 ·