04:00
2026-06-15
arxiv.org
large-language-models
MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis
Researchers introduced MA-ProofBench, the first formal theorem-proving benchmark for mathematical analysis, containing 200 problems across two difficulty levels. Evaluations of leading LLMs showed pooโฆ