LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

wpnews.pro

cd /news/artificial-intelligence/leanmarathon-toward-reliable-ai-co-m… · home › topics › artificial-intelligence › article

[ARTICLE · art-23125] src=arxiv.org pub=2026-06-06T04:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

Researchers have developed LeanMarathon, a multi-agent system that reliably autoformalizes complex mathematical proofs into the Lean theorem prover by breaking long-horizon tasks into parallel, recoverable transactions. The system successfully formalized all seven target theorems from two recent research papers on Erdős problems without errors, proving 258 lemmas and theorems across three autonomous runs. This demonstrates that durable coordination frameworks, not just stronger provers, are essential for enabling AI to serve as reliable co-mathematicians in long mathematical developments.

read1 min publishedJun 6, 2026

arXiv:2606.05400v1 Announce Type: new Abstract: Long-horizon autoformalization of research mathematics fails not only at hard lemmas, but at scale: statements drift, dependencies tangle, context decays, and local repairs corrupt distant work. We present LeanMarathon, a multi-agent harness for reliable research-level Lean autoformalization. Its core abstraction is an evolving blueprint: a Lean file that serves simultaneously as formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents construct, audit, prove, and repair this blueprint. These agents are coordinated by a two-stage orchestrator that first stabilizes target fidelity through adversarial review and then discharges the proof directed acyclic graph (DAG) from its dynamic leaves upward in parallel CI-gated rounds. LeanMarathon turns one brittle multi-hour run into many local, recoverable, parallel transactions. We evaluate LeanMarathon on two recent research papers spanning four Erd\H{o}s problems (#1051, #1196, #164, #1217). Across three autonomous runs, it formalizes all seven target theorems with no sorry, proving 258 lemmas and theorems. These results show that reliable AI co-mathematics requires not only stronger provers, but durable harnesses that preserve target fidelity across long mathematical developments. The code can be found at https://github.com/YuanheZ/LeanMarathon.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/leanmarathon-toward-reli…

Read original on arxiv.org → arxiv.org/abs/2606.05400

mentioned entities

LeanMarathon

Lean

Erdős

metadata

slugleanmarathon-toward-reliable-ai-co-mathematicians-through-long-horizon-lean

topic#artificial-intelligence

secondary3 topics

sentimentpositive

langen

canonicalarxiv.org

navigation

← prevAI Surfer News

next →The Ethical Dilemmas of AI

── more in #artificial-intelligence 4 stories · sorted by recency

arxiv.org · 6 Jun · #artificial-intelligence

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

arxiv.org · 6 Jun · #artificial-intelligence

Agents' Last Exam

arxiv.org · 6 Jun · #artificial-intelligence

Harnessing Generalist Agents for Contextualized Time Series

arxiv.org · 6 Jun · #artificial-intelligence

SentinelBench: A Benchmark for Long-Running Monitoring Agents

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required