cd /news/ai-research/theoremgraph-search-18m-mathematical… · home topics ai-research article
[ARTICLE · art-44701] src=theoremsearch.com ↗ pub= topic=ai-research verified=true sentiment=↑ positive

TheoremGraph: Search 18M+ Mathematical Dependencies

Researchers at the University of Washington released TheoremGraph, a unified dependency graph spanning 18 million+ mathematical statements from arXiv papers and the Lean formal proof assistant. The graph enables mathematicians to detect non-novel results and AI agents to navigate mathematical dependencies across informal and formal spaces.

read3 min views1 publishedJun 30, 2026

mathematical dependencies

A unified statement-level dependency graph spanning both informal and formal mathematics, including 11.7 million arXiv statements linked to Mathlib through a shared embedding space.

Simon Kurgan, Evan Wang, Eric Leonen, Sophie Szeto, Luke Alexander, Artemii Remizov, Jarod Alper, Giovanni Inchiostro, Vasily Ilin

Motivation #

Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly. Informal papers cite mostly at the document level, while formal proof assistants like Lean record fine-grained dependencies over a much smaller body of mathematics. This asymmetry limits attribution, duplication detection, and automated formalization.

For mathematicians, a unified dependency graph makes it possible to check whether a result is already known and trace exactly which lemmas a proof relies on. A 2024 study found that 2.4% of withdrawn arXiv submissions were self-identified as non-novel, which might have been identified earlier with improved cross-paper dependency tracking.

For AI agents, the graph gives neural theorem provers and autoformalization tools a access to mathematics as a graph. Instead of retrieving flat semantic neighbors, an agent can walk the graph to find connected lemmas and related formalizations across informal and formal spaces.

How It Works #

  • 1 Parse informal statements. We extract over 11.7 million theorem-like environments from mathematics arXiv papers using a regex-based parser, and recover 18.3 million candidate directed dependency edges. We use deterministic, heuristic, and notation-based parsing methods, and label each edge with the parser type. - 2 Extract the formal graph. LeanGraph extracts typed declaration-level dependencies from Mathlib4 and 25 open-source Lean projects, yielding 388,105 nodes and 11.3 million typed edges across six semantic categories. - 3 Generate slogans and embed. Every formal and informal statement we extract is summarized into a concise natural-language slogan by Qwen3-235B and embedded with Qwen3-Embedding-8B into a shared semantic space. - 4 Bridging the corpora. A cross-modal nearest-neighbor sweep proposes (informal, formal) candidate matches, and a GPT-5.4 judge affirms 47,952 matches above a 0.8 cosine similarity floor.

Overview #

Informal Dependency Extractors

EXTRACTOR EDGES PRECISION
Deterministic 5.23M 98.8%
Heuristic 6.47M 76.6%
Notation 7.88M 42.7%
Any (total) 18.3M 68.1%

Precision estimated by LLM judge (Kimi K2.5) on 500 sampled arXiv papers.

Formal Edge Types

TYPE DESCRIPTION
proof Used inside a theorem proof term
sig Appears in a type signature
def Used in a non-Prop definition body
field Referenced in a structure field type
extends Structure or class inheritance
docref Backtick reference in a docstring

LeanGraph extracts 388,105 nodes and 11.3M typed edges across 25 Lean projects (Mathlib v4.27-v4.29 plus 24 community formalizations).

REST API #

TheoremGraph provides a REST API for semantic search and dependency graph traversal.

curl "https://api.theoremsearch.com/graph/paper?external_id=2301.00001"

Returns all statements and dependency edges for an arXiv paper or Lean repository. Full API reference →

MCP Tool #

TheoremGraph is also available as an MCP tool for AI agents via a single theorem_search

tool.

https://api.theoremsearch.com/mcp

── more in #ai-research 4 stories · sorted by recency
── more on @theoremgraph 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/theoremgraph-search-…] indexed:0 read:3min 2026-06-30 ·