cd /news/developer-tools/graphify-open-source-knowledge-graph… · home topics developer-tools article
[ARTICLE · art-42141] src=graphify.net ↗ pub= topic=developer-tools verified=true sentiment=↑ positive

Graphify – Open-Source Knowledge Graph Skill for AI Coding Assistants

Safi Shamsi released Graphify, an open-source skill that builds queryable knowledge graphs from multi-modal codebases for AI coding assistants like Claude Code and OpenAI Codex. The tool combines Tree-sitter static analysis with LLM-driven semantic extraction to help assistants understand code structure and design rationale, achieving 3.7k GitHub stars and a 71.5x token reduction.

read5 min views1 publishedJun 27, 2026
Graphify – Open-Source Knowledge Graph Skill for AI Coding Assistants
Image: source

for AI Coding Assistants

Graphify is an open-source skill that helps AI coding assistants understand multi-modal codebases by building a queryable knowledge graph from code, docs, papers and diagrams.

pip install graphifyy

What is Graphify? #

Graphify is a multi-modal knowledge graph builder created for AI coding assistants such as Claude Code, OpenAI Codex and OpenCode. By combining Tree-sitter static analysis with LLM-driven semantic extraction, Graphify turns an entire repository — including source code, documentation, research papers and diagrams — into an interactive graph that explains both what the code does and why it was designed that way. The project is maintained by Safi Shamsi, released under the permissive MIT license, and built on widely-trusted libraries including NetworkX and Tree-sitter.

3.7k+ GitHub Stars

MIT License

71.5× Token Reduction

Python 3.10+ Runtime

Core Capabilities #

Graphify unifies static analysis, semantic extraction and graph clustering into a single skill that any AI coding assistant can invoke.

Multi-Modal Extraction

Parses code (.py, .js, .go, .java, …), Markdown, PDFs and images. Tree-sitter extracts ASTs, call graphs and docstrings; LLMs extract concepts from prose; vision models read diagrams.

Knowledge Graph Build

Merges all extracted nodes and edges into a NetworkX graph and applies the Leiden algorithm for semantic community detection — no vector embeddings required.

God Nodes & Surprises

Identifies the highest-degree "god nodes" at the heart of the system and flags unexpected cross-file or cross-domain connections worth investigating.

Interactive Outputs

Exports an interactive graph.html

, a queryable graph.json

, and a human-readable GRAPH_REPORT.md

audit report.

Assistant Integration

Ships with /graphify

, /graphify query

, /graphify path

and /graphify explain

commands for Claude Code, Codex, OpenCode and more.

Secure by Design

Strict input validation: only http/https URLs, size and timeout limits, path containment, HTML-escaped node labels — defending against SSRF, injection and XSS.

Architecture & Pipeline #

Graphify is a multi-stage pipeline. Each stage is an isolated module so contributors can extend any step independently.

detect— collect files

extract— AST + LLM nodes/edges

build— NetworkX graph

cluster— Leiden communities

analyze— god nodes & surprises

report— GRAPH_REPORT.md

export— HTML / JSON / Obsidian

Supporting modules include ingest.py

for URL fetching, cache.py

for semantic caching, security.py

for input validation, watch.py

for live updates and serve.py

for MCP-protocol service.

Install & Run #

Graphify is distributed on PyPI. The package name is graphifyy

; the CLI command remains graphify

.

pip install graphifyy && graphify install

/graphify ./raw

graphify-out/
├── graph.html        # interactive visualization
├── GRAPH_REPORT.md   # core nodes, surprises, suggested questions
├── graph.json        # persistent, queryable graph
└── cache/            # incremental cache

Graphify does not bundle an LLM. It uses the model API key already configured by your AI coding assistant (Claude, Codex, etc.) and only sends semantic content — never raw source code — to the upstream model.

Worked Examples #

The repository ships with reproducible corpora demonstrating Graphify on both small libraries and large mixed code-and-paper collections.

httpx (small)

6 Python files modeling an HTTP transport layer. Result: 144 nodes, 330 edges, 6 communities. God nodes: Client

, AsyncClient

, Response

, Request

. Surprise edge: DigestAuth → Response

.

Karpathy mixed corpus

3 GPT framework repos + 5 attention papers + 4 diagrams (~52 files, ~92k words). Result: 285 nodes, 340 edges, 53 communities. Average query cost ~1.7k tokens vs ~123k naive — a 71.5× reduction.

Comparison #

How Graphify relates to adjacent open-source projects in the code-intelligence space.

Project Focus Strength Limitation vs Graphify
Sourcegraph Cross-repo code search Enterprise-grade navigation Not a knowledge graph; limited design semantics
Code2Vec Function-level embeddings Vector retrieval & classification No graph structure, no multi-modal input
Neo4j General graph database Powerful Cypher queries Does not generate graphs from code itself

Security, Licensing & Trust #

Graphify is released under the MIT License. Its core dependencies — NetworkX (BSD) and Tree-sitter (MIT) — are all permissive open-source licenses with no conflicts. The project performs no telemetry. The only outbound network call is the semantic-extraction step, which uses your own configured AI model API key; only semantic descriptions of documents are transmitted, never raw source code. URLs are restricted to http/https, downloads are size- and time-bounded, output paths are containment-checked, and node labels are HTML-escaped to prevent SSRF, Cypher injection and XSS.

Learn more about Graphify #

Deeper guides on how Graphify builds, clusters and serves knowledge graphs to AI coding assistants.

Knowledge Graphs for AI Coding Assistants

Why structural graphs beat vector RAG for code understanding.

Tree-sitter AST Extraction

How Graphify parses 19 languages locally, with no LLM calls on source.

Leiden Community Detection

Clustering on graph topology alone — no embeddings, no vector store.

Claude Code Integration

CLAUDE.md directives and the PreToolUse hook, step by step.

CLI Command Reference

Every /graphify

and graphify

command in one place.

Graphify vs Alternatives

Honest comparison against Sourcegraph, Code2Vec and Neo4j.

Frequently Asked Questions #

Does Graphify send my code to a third-party model? #

No. Graphify only sends semantic descriptions of documents and diagrams to the AI model you have already configured in your assistant — never raw source files.

Which AI coding assistants are supported? #

Claude Code, OpenAI Codex and OpenCode are supported out of the box via dedicated skill-*.md

manifests. Any assistant that can call shell commands can invoke graphify

.

How large a codebase can Graphify handle? #

Tree-sitter parsing and NetworkX construction scale linearly with code size. On a ~500k-word corpus, BFS subgraph queries stay around ~2k tokens versus ~670k naive — preserving compression at scale.

Is Graphify free for commercial use? #

Yes. Graphify is MIT-licensed and free for both personal and commercial use.

── more in #developer-tools 4 stories · sorted by recency
── more on @graphify 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/graphify-open-source…] indexed:0 read:5min 2026-06-27 ·