Every code-intelligence tool I've ever used falls into one of two traps.
The first is the grep-and-read loop: you (or your AI agent) search for a name, open ten files, read around the matches, follow an import, search again. It works, but it's slow, it burns tokens, and it has no idea that the process_order
you found in services.py
is the same process_order
that gets called from api.py
β versus the unrelated one in tests/
.
The second is the single-language silo: tools that understand Python beautifully but go blind the moment your TypeScript front end calls a Python FastAPI route. Real systems are polyglot. Your tooling usually isn't.
graphlens is an open-source (MIT) framework built to escape both traps. It parses a source project, normalizes its structure into a shared
Repository β Language Adapter β GraphLens (IR) β Graph Backend
| Layer | Responsibility |
|---|---|
| Language Adapter | |
Parses source files, produces a GraphLens |
|
| GraphLens | |
| Typed nodes + directed relations β the intermediate representation | |
| Graph Backend | |
| Persists or queries the graph (Neo4j, in-memory, your own) |
The key design decision: adapters are pure data producers. They never write to a database, never touch the filesystem after reading, never run a server. The graph is the only output. That makes the whole pipeline trivially testable, cacheable, and serializable.
pip install "graphlens-cli[python]"
graphlens analyze ./my-project
graphlens Β· my-project
nodes: 1240
relations: 3981
resolver: ok
nodes by kind relations by kind
FUNCTION 410 CONTAINS 980
METHOD 265 DECLARES 870
CLASS 98 CALLS 640
MODULE 54 REFERENCES 410
Or from Python:
from pathlib import Path
from graphlens import adapter_registry
adapter = adapter_registry.load("python")()
graph = adapter.analyze(Path("./my-project"))
print(len(graph.nodes), "nodes,", len(graph.relations), "relations")
fn = graph.nodes_by_name("process_order")[0]
print("called by:", [n.name for n in graph.callers(fn.id)])
Most lightweight code-graph tools resolve references by name: see a call to save()
, draw an edge to anything called save
. That's fast and wrong β there are usually a dozen save
s in a codebase.
graphlens splits the work in two:
definition_at(file, line, col)
for each occurrence. The resolved definition becomes a real edge to the | Language | Resolver | Engine |
|---|---|---|
| Python | TyResolver |
ty |
TsResolver
GoplsResolver
gopls
RustAnalyzerResolver
rust-analyzer
So a CALLS
edge points at the real function, a HAS_TYPE
edge at the real class, an INHERITS_FROM
edge at the real base. This is the difference between "probably related" and "is related".
Type analysis can degrade β a toolchain is missing, a file doesn't type-check. Instead of silently producing a half-resolved graph, graphlens records the outcome:
from graphlens import RESOLVER_STATUS_KEY
graph.metadata[RESOLVER_STATUS_KEY] # 'ok' | 'degraded' | 'unavailable'
In CI you flip on --strict
and a non-ok
status fails the build, so an agent or dashboard never consumes a graph that's quietly incomplete.
Nodes (PROJECT
, MODULE
, FILE
, CLASS
, METHOD
, FUNCTION
, PARAMETER
, VARIABLE
, ATTRIBUTE
, TYPE_ALIAS
, IMPORT
, DEPENDENCY
, EXTERNAL_SYMBOL
, BOUNDARY
) are frozen dataclasses with an id, kind, qualified name, file path, span, and free-form metadata.
Relations are directed, typed edges:
| Kind | Meaning |
|---|---|
CONTAINS / DECLARES |
|
| structural containment & declaration | |
IMPORTS / RESOLVES_TO |
|
| import statements and where they resolve | |
CALLS / REFERENCES / INHERITS_FROM / HAS_TYPE |
|
| resolved, type-aware edges | |
DEPENDS_ON |
|
| declared package dependency | |
EXPOSES / CONSUMES / COMMUNICATES_WITH |
|
| cross-language boundaries |
A node's ID is a SHA-256 hash of project::kind::qualified_name
:
from graphlens import make_node_id
make_node_id("my-project", "my.module.func", "FUNCTION")
Because the ID depends only on identity, not file position, re-scanning yields the same IDs. That's what makes graph.diff(other)
and incremental updates work β and what makes a graph cacheable in CI.
This is my favorite part. Adapters emit language-agnostic ** BOUNDARY** nodes for the interfaces a service exposes or consumes β HTTP routes, queue topics, gRPC methods, Temporal activities β with an
EXPOSES
edge (provider) or CONSUMES
edge (consumer).A boundary's ID is make_boundary_id(mechanism, key)
β no project or language in it. HTTP paths are normalized so that /users/1
, /users/{user_id}
(FastAPI), <int:id>
(Flask), and :id
(Express) all collapse to GET /users/{}
.
The payoff: a Python FastAPI route and a TypeScript fetch
to the same endpoint produce the same boundary ID. Merge the two graphs, run graphlens-link
, and you get COMMUNICATES_WITH
edges spanning the language gap:
from graphlens import adapter_registry
from graphlens_link import link_graph
py = adapter_registry.load("python")().analyze(python_project)
ts = adapter_registry.load("typescript")().analyze(typescript_project)
merged = py
merged.merge(ts, allow_shared=True) # identical BOUNDARY nodes coincide
result = link_graph(merged) # adds consumer β provider edges
print(result.relations_added, "COMMUNICATES_WITH edges added")
Now you can answer "which front-end calls hit this endpoint?" β a question no single-language tool can even represent.
As a library β load an adapter, get a GraphLens
, query it: callers, callees, references, neighborhoods, diffs, JSON round-trips, multi-language merges.
From the CLI β five subcommands cover the common workflows:
graphlens analyze ./repo --output graph.json # index
graphlens query process_order -g graph.json --op callers
graphlens visualize ./repo # interactive vis.js HTML
graphlens neo4j ./repo --uri bolt://localhost:7687
graphlens mcp --graph graph.json # serve to agents
In CI β --strict
plus a Docker image (ghcr.io/neko1313/graphlens
) with every adapter and toolchain pre-installed. Index on every push, publish the graph as an artifact, fail on a degraded graph.
To LLM agents over MCP β graphlens mcp
exposes a saved graph as Model Context Protocol query tools (stats
, find
, callers
, callees
, references
, neighbors
, boundaries
, communicates_with
). Instead of dumping a codebase into the prompt, the agent asks precise questions and gets small structured answers β resolved edges, not best-effort text search.
As a Neo4j export β straight into a graph database with UNWIND β¦ MERGE
Cypher (no APOC required), then query it however you like.
The core never imports an adapter. Each language is a separate package that registers itself via Python entry points:
[project.entry-points."graphlens.adapters"]
python = "graphlens_python:PythonAdapter"
Callers resolve adapters through a registry, by name string:
adapter_registry.available() # ['python', 'typescript', ...]
adapter = adapter_registry.load("python")()
Adding a new language means writing one package against the LanguageAdapter
contract β no changes to the core.
The scope is deliberately narrow, and the docs spell it out. graphlens produces a graph IR and stops there. It does not:
visualize
emits static HTML, mcp
exposes query tools β neither hosts a long-running service).Those belong to tools built on top of graphlens. Keeping the core minimal is what keeps it composable.
Throughput on real-world projects, refreshed on every release inside the published Docker image (single cold run, indicative):
| Project | Lang | LOC | Nodes | Time | Resolved |
|---|---|---|---|---|---|
| apache/superset | python | 399 519 | 156 251 | 148.7s | 84% |
| colinhacks/zod | typescript | 74 194 | 8 741 | 19.0s | 91% |
| gin-gonic/gin | go | 23 672 | 7 227 | 13.9s | 100% |
| gohugoio/hugo | go | 224 821 | 34 809 | 112.7s | 99% |
| BurntSushi/ripgrep | rust | 50 275 | 9 612 | 113.1s | 99% |
pip install "graphlens-cli[python]"
graphlens analyze . --output graph.json
graphlens visualize .
ty
) and TypeScript (Node) toolchains install on demand; Go and Rust adapters come via the Docker image.If you've ever wanted a single, accurate, language-agnostic model of "how does this codebase actually fit together" β that's exactly what graphlens hands you. I'd love feedback, issues, and adapter contributions.