{"slug": "theoremgraph-search-18m-mathematical-dependencies", "title": "TheoremGraph: Search 18M+ Mathematical Dependencies", "summary": "Researchers at the University of Washington released TheoremGraph, a unified dependency graph spanning 18 million+ mathematical statements from arXiv papers and the Lean formal proof assistant. The graph enables mathematicians to detect non-novel results and AI agents to navigate mathematical dependencies across informal and formal spaces.", "body_md": "# Search 18 million+\n\nmathematical dependencies\n\nA unified statement-level dependency graph spanning both informal and formal mathematics, including 11.7 million arXiv statements linked to Mathlib through a shared embedding space.\n\n[Simon Kurgan](https://www.linkedin.com/in/simon-kurgan/), [Evan Wang](https://github.com/aurasoph/), [Eric Leonen](https://github.com/ericleonen), [Sophie Szeto](https://www.linkedin.com/in/sophie-szeto/), [Luke Alexander](https://www.linkedin.com/in/lukealexanderluke/), [Artemii Remizov](https://www.linkedin.com/in/artemii-remizov-62783631b/), [Jarod Alper](https://sites.math.washington.edu//~jarod/), [Giovanni Inchiostro](https://sites.math.washington.edu/~ginchios/), [Vasily Ilin](https://vilin97.github.io/)\n\n## Motivation\n\nMathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly. Informal papers cite mostly at the document level, while formal proof assistants like Lean record fine-grained dependencies over a much smaller body of mathematics. This asymmetry limits attribution, duplication detection, and automated formalization.\n\n**For mathematicians**, a unified dependency graph makes it possible to check whether a result is already known and trace exactly which lemmas a proof relies on. A[ 2024 study](https://arxiv.org/abs/2412.03775) found that 2.4% of withdrawn arXiv submissions were self-identified as non-novel, which might have been identified earlier with improved cross-paper dependency tracking.\n\n**For AI agents**, the graph gives neural theorem provers and autoformalization tools a access to mathematics as a graph. Instead of retrieving flat semantic neighbors, an agent can walk the graph to find connected lemmas and related formalizations across informal and formal spaces.\n\n## How It Works\n\n- 1\n**Parse informal statements.** We extract over 11.7 million theorem-like environments from mathematics arXiv papers using a regex-based parser, and recover 18.3 million candidate directed dependency edges. We use deterministic, heuristic, and notation-based parsing methods, and label each edge with the parser type. - 2\n**Extract the formal graph.** LeanGraph extracts typed declaration-level dependencies from Mathlib4 and 25 open-source Lean projects, yielding 388,105 nodes and 11.3 million typed edges across six semantic categories. - 3\n**Generate slogans and embed.** Every formal and informal statement we extract is summarized into a concise natural-language slogan by Qwen3-235B and embedded with Qwen3-Embedding-8B into a shared semantic space. - 4\n**Bridging the corpora.** A cross-modal nearest-neighbor sweep proposes (informal, formal) candidate matches, and a GPT-5.4 judge affirms 47,952 matches above a 0.8 cosine similarity floor.\n\n## Overview\n\n### Informal Dependency Extractors\n\n| EXTRACTOR | EDGES | PRECISION |\n|---|---|---|\n| Deterministic | 5.23M | 98.8% |\n| Heuristic | 6.47M | 76.6% |\n| Notation | 7.88M | 42.7% |\n| Any (total) | 18.3M | 68.1% |\n\nPrecision estimated by LLM judge (Kimi K2.5) on 500 sampled arXiv papers.\n\n### Formal Edge Types\n\n| TYPE | DESCRIPTION |\n|---|---|\n| proof | Used inside a theorem proof term |\n| sig | Appears in a type signature |\n| def | Used in a non-Prop definition body |\n| field | Referenced in a structure field type |\n| extends | Structure or class inheritance |\n| docref | Backtick reference in a docstring |\n\nLeanGraph extracts 388,105 nodes and 11.3M typed edges across 25 Lean projects (Mathlib v4.27-v4.29 plus 24 community formalizations).\n\n## REST API\n\nTheoremGraph provides a REST API for semantic search and dependency graph traversal.\n\n```\ncurl \"https://api.theoremsearch.com/graph/paper?external_id=2301.00001\"\n```\n\nReturns all statements and dependency edges for an arXiv paper or Lean repository. [Full API reference →](/docs)\n\n## MCP Tool\n\nTheoremGraph is also available as an MCP tool for AI agents via a single `theorem_search`\n\ntool.\n\n`https://api.theoremsearch.com/mcp`", "url": "https://wpnews.pro/news/theoremgraph-search-18m-mathematical-dependencies", "canonical_source": "https://www.theoremsearch.com/theorem-graph", "published_at": "2026-06-30 11:14:59+00:00", "updated_at": "2026-06-30 11:20:51.802353+00:00", "lang": "en", "topics": ["ai-research", "ai-tools", "large-language-models", "natural-language-processing", "ai-infrastructure"], "entities": ["TheoremGraph", "arXiv", "Mathlib", "Lean", "Simon Kurgan", "Evan Wang", "University of Washington", "Qwen3"], "alternates": {"html": "https://wpnews.pro/news/theoremgraph-search-18m-mathematical-dependencies", "markdown": "https://wpnews.pro/news/theoremgraph-search-18m-mathematical-dependencies.md", "text": "https://wpnews.pro/news/theoremgraph-search-18m-mathematical-dependencies.txt", "jsonld": "https://wpnews.pro/news/theoremgraph-search-18m-mathematical-dependencies.jsonld"}}