cd /news/developer-tools/graphlens-a-polyglot-code-analysis-f… Β· home β€Ί topics β€Ί developer-tools β€Ί article
[ARTICLE Β· art-36024] src=dev.to β†— pub= topic=developer-tools verified=true sentiment=↑ positive

graphlens: a polyglot code-analysis framework that turns your repo into a typed graph

A developer released graphlens, an open-source code-analysis framework that parses source projects into a typed graph with resolved references. The tool uses language-specific adapters and resolvers to produce accurate edges for calls, types, and inheritance, avoiding the common pitfalls of grep-and-read loops and single-language silos.

read6 min views1 publishedJun 22, 2026

Every code-intelligence tool I've ever used falls into one of two traps.

The first is the grep-and-read loop: you (or your AI agent) search for a name, open ten files, read around the matches, follow an import, search again. It works, but it's slow, it burns tokens, and it has no idea that the process_order

you found in services.py

is the same process_order

that gets called from api.py

β€” versus the unrelated one in tests/

.

The second is the single-language silo: tools that understand Python beautifully but go blind the moment your TypeScript front end calls a Python FastAPI route. Real systems are polyglot. Your tooling usually isn't.

graphlens is an open-source (MIT) framework built to escape both traps. It parses a source project, normalizes its structure into a shared

Repository β†’ Language Adapter β†’ GraphLens (IR) β†’ Graph Backend
Layer Responsibility
Language Adapter
Parses source files, produces a GraphLens
GraphLens
Typed nodes + directed relations β€” the intermediate representation
Graph Backend
Persists or queries the graph (Neo4j, in-memory, your own)

The key design decision: adapters are pure data producers. They never write to a database, never touch the filesystem after reading, never run a server. The graph is the only output. That makes the whole pipeline trivially testable, cacheable, and serializable.

pip install "graphlens-cli[python]"
graphlens analyze ./my-project
graphlens Β· my-project
  nodes:      1240
  relations:  3981
  resolver:   ok

nodes by kind        relations by kind
  FUNCTION    410       CONTAINS    980
  METHOD      265       DECLARES    870
  CLASS        98       CALLS       640
  MODULE       54       REFERENCES  410

Or from Python:

from pathlib import Path
from graphlens import adapter_registry

adapter = adapter_registry.load("python")()
graph = adapter.analyze(Path("./my-project"))

print(len(graph.nodes), "nodes,", len(graph.relations), "relations")

fn = graph.nodes_by_name("process_order")[0]
print("called by:", [n.name for n in graph.callers(fn.id)])

Most lightweight code-graph tools resolve references by name: see a call to save()

, draw an edge to anything called save

. That's fast and wrong β€” there are usually a dozen save

s in a codebase.

graphlens splits the work in two:

definition_at(file, line, col)

for each occurrence. The resolved definition becomes a real edge to the | Language | Resolver | Engine | |---|---|---| | Python | TyResolver | ty |

TsResolver

GoplsResolver

gopls

RustAnalyzerResolver

rust-analyzer

So a CALLS

edge points at the real function, a HAS_TYPE

edge at the real class, an INHERITS_FROM

edge at the real base. This is the difference between "probably related" and "is related".

Type analysis can degrade β€” a toolchain is missing, a file doesn't type-check. Instead of silently producing a half-resolved graph, graphlens records the outcome:

from graphlens import RESOLVER_STATUS_KEY
graph.metadata[RESOLVER_STATUS_KEY]   # 'ok' | 'degraded' | 'unavailable'

In CI you flip on --strict

and a non-ok

status fails the build, so an agent or dashboard never consumes a graph that's quietly incomplete.

Nodes (PROJECT

, MODULE

, FILE

, CLASS

, METHOD

, FUNCTION

, PARAMETER

, VARIABLE

, ATTRIBUTE

, TYPE_ALIAS

, IMPORT

, DEPENDENCY

, EXTERNAL_SYMBOL

, BOUNDARY

) are frozen dataclasses with an id, kind, qualified name, file path, span, and free-form metadata.

Relations are directed, typed edges:

Kind Meaning
CONTAINS / DECLARES
structural containment & declaration
IMPORTS / RESOLVES_TO
import statements and where they resolve
CALLS / REFERENCES / INHERITS_FROM / HAS_TYPE
resolved, type-aware edges
DEPENDS_ON
declared package dependency
EXPOSES / CONSUMES / COMMUNICATES_WITH
cross-language boundaries

A node's ID is a SHA-256 hash of project::kind::qualified_name

:

from graphlens import make_node_id
make_node_id("my-project", "my.module.func", "FUNCTION")

Because the ID depends only on identity, not file position, re-scanning yields the same IDs. That's what makes graph.diff(other)

and incremental updates work β€” and what makes a graph cacheable in CI.

This is my favorite part. Adapters emit language-agnostic ** BOUNDARY** nodes for the interfaces a service exposes or consumes β€” HTTP routes, queue topics, gRPC methods, Temporal activities β€” with an

EXPOSES

edge (provider) or CONSUMES

edge (consumer).A boundary's ID is make_boundary_id(mechanism, key)

β€” no project or language in it. HTTP paths are normalized so that /users/1

, /users/{user_id}

(FastAPI), <int:id>

(Flask), and :id

(Express) all collapse to GET /users/{}

.

The payoff: a Python FastAPI route and a TypeScript fetch

to the same endpoint produce the same boundary ID. Merge the two graphs, run graphlens-link

, and you get COMMUNICATES_WITH

edges spanning the language gap:

from graphlens import adapter_registry
from graphlens_link import link_graph

py = adapter_registry.load("python")().analyze(python_project)
ts = adapter_registry.load("typescript")().analyze(typescript_project)

merged = py
merged.merge(ts, allow_shared=True)   # identical BOUNDARY nodes coincide
result = link_graph(merged)           # adds consumer β†’ provider edges

print(result.relations_added, "COMMUNICATES_WITH edges added")

Now you can answer "which front-end calls hit this endpoint?" β€” a question no single-language tool can even represent.

As a library β€” load an adapter, get a GraphLens

, query it: callers, callees, references, neighborhoods, diffs, JSON round-trips, multi-language merges.

From the CLI β€” five subcommands cover the common workflows:

graphlens analyze ./repo --output graph.json   # index
graphlens query process_order -g graph.json --op callers
graphlens visualize ./repo                      # interactive vis.js HTML
graphlens neo4j ./repo --uri bolt://localhost:7687
graphlens mcp --graph graph.json                # serve to agents

In CI β€” --strict

plus a Docker image (ghcr.io/neko1313/graphlens

) with every adapter and toolchain pre-installed. Index on every push, publish the graph as an artifact, fail on a degraded graph.

To LLM agents over MCP β€” graphlens mcp

exposes a saved graph as Model Context Protocol query tools (stats

, find

, callers

, callees

, references

, neighbors

, boundaries

, communicates_with

). Instead of dumping a codebase into the prompt, the agent asks precise questions and gets small structured answers β€” resolved edges, not best-effort text search.

As a Neo4j export β€” straight into a graph database with UNWIND … MERGE

Cypher (no APOC required), then query it however you like.

The core never imports an adapter. Each language is a separate package that registers itself via Python entry points:

[project.entry-points."graphlens.adapters"]
python = "graphlens_python:PythonAdapter"

Callers resolve adapters through a registry, by name string:

adapter_registry.available()        # ['python', 'typescript', ...]
adapter = adapter_registry.load("python")()

Adding a new language means writing one package against the LanguageAdapter

contract β€” no changes to the core.

The scope is deliberately narrow, and the docs spell it out. graphlens produces a graph IR and stops there. It does not:

visualize

emits static HTML, mcp

exposes query tools β€” neither hosts a long-running service).Those belong to tools built on top of graphlens. Keeping the core minimal is what keeps it composable.

Throughput on real-world projects, refreshed on every release inside the published Docker image (single cold run, indicative):

Project Lang LOC Nodes Time Resolved
apache/superset python 399 519 156 251 148.7s 84%
colinhacks/zod typescript 74 194 8 741 19.0s 91%
gin-gonic/gin go 23 672 7 227 13.9s 100%
gohugoio/hugo go 224 821 34 809 112.7s 99%
BurntSushi/ripgrep rust 50 275 9 612 113.1s 99%
pip install "graphlens-cli[python]"
graphlens analyze . --output graph.json
graphlens visualize .

ty

) and TypeScript (Node) toolchains install on demand; Go and Rust adapters come via the Docker image.If you've ever wanted a single, accurate, language-agnostic model of "how does this codebase actually fit together" β€” that's exactly what graphlens hands you. I'd love feedback, issues, and adapter contributions.

── more in #developer-tools 4 stories Β· sorted by recency
── more on @graphlens 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/graphlens-a-polyglot…] indexed:0 read:6min 2026-06-22 Β· β€”