TheoremGraph: Search 18M+ Mathematical Dependencies Researchers at the University of Washington released TheoremGraph, a unified dependency graph spanning 18 million+ mathematical statements from arXiv papers and the Lean formal proof assistant. The graph enables mathematicians to detect non-novel results and AI agents to navigate mathematical dependencies across informal and formal spaces. Search 18 million+ mathematical dependencies A unified statement-level dependency graph spanning both informal and formal mathematics, including 11.7 million arXiv statements linked to Mathlib through a shared embedding space. Simon Kurgan https://www.linkedin.com/in/simon-kurgan/ , Evan Wang https://github.com/aurasoph/ , Eric Leonen https://github.com/ericleonen , Sophie Szeto https://www.linkedin.com/in/sophie-szeto/ , Luke Alexander https://www.linkedin.com/in/lukealexanderluke/ , Artemii Remizov https://www.linkedin.com/in/artemii-remizov-62783631b/ , Jarod Alper https://sites.math.washington.edu//~jarod/ , Giovanni Inchiostro https://sites.math.washington.edu/~ginchios/ , Vasily Ilin https://vilin97.github.io/ Motivation Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly. Informal papers cite mostly at the document level, while formal proof assistants like Lean record fine-grained dependencies over a much smaller body of mathematics. This asymmetry limits attribution, duplication detection, and automated formalization. For mathematicians , a unified dependency graph makes it possible to check whether a result is already known and trace exactly which lemmas a proof relies on. A 2024 study https://arxiv.org/abs/2412.03775 found that 2.4% of withdrawn arXiv submissions were self-identified as non-novel, which might have been identified earlier with improved cross-paper dependency tracking. For AI agents , the graph gives neural theorem provers and autoformalization tools a access to mathematics as a graph. Instead of retrieving flat semantic neighbors, an agent can walk the graph to find connected lemmas and related formalizations across informal and formal spaces. How It Works - 1 Parse informal statements. We extract over 11.7 million theorem-like environments from mathematics arXiv papers using a regex-based parser, and recover 18.3 million candidate directed dependency edges. We use deterministic, heuristic, and notation-based parsing methods, and label each edge with the parser type. - 2 Extract the formal graph. LeanGraph extracts typed declaration-level dependencies from Mathlib4 and 25 open-source Lean projects, yielding 388,105 nodes and 11.3 million typed edges across six semantic categories. - 3 Generate slogans and embed. Every formal and informal statement we extract is summarized into a concise natural-language slogan by Qwen3-235B and embedded with Qwen3-Embedding-8B into a shared semantic space. - 4 Bridging the corpora. A cross-modal nearest-neighbor sweep proposes informal, formal candidate matches, and a GPT-5.4 judge affirms 47,952 matches above a 0.8 cosine similarity floor. Overview Informal Dependency Extractors | EXTRACTOR | EDGES | PRECISION | |---|---|---| | Deterministic | 5.23M | 98.8% | | Heuristic | 6.47M | 76.6% | | Notation | 7.88M | 42.7% | | Any total | 18.3M | 68.1% | Precision estimated by LLM judge Kimi K2.5 on 500 sampled arXiv papers. Formal Edge Types | TYPE | DESCRIPTION | |---|---| | proof | Used inside a theorem proof term | | sig | Appears in a type signature | | def | Used in a non-Prop definition body | | field | Referenced in a structure field type | | extends | Structure or class inheritance | | docref | Backtick reference in a docstring | LeanGraph extracts 388,105 nodes and 11.3M typed edges across 25 Lean projects Mathlib v4.27-v4.29 plus 24 community formalizations . REST API TheoremGraph provides a REST API for semantic search and dependency graph traversal. curl "https://api.theoremsearch.com/graph/paper?external id=2301.00001" Returns all statements and dependency edges for an arXiv paper or Lean repository. Full API reference → /docs MCP Tool TheoremGraph is also available as an MCP tool for AI agents via a single theorem search tool. https://api.theoremsearch.com/mcp