{"slug": "graphlens-a-polyglot-code-analysis-framework-that-turns-your-repo-into-a-typed", "title": "graphlens: a polyglot code-analysis framework that turns your repo into a typed graph", "summary": "A developer released graphlens, an open-source code-analysis framework that parses source projects into a typed graph with resolved references. The tool uses language-specific adapters and resolvers to produce accurate edges for calls, types, and inheritance, avoiding the common pitfalls of grep-and-read loops and single-language silos.", "body_md": "Every code-intelligence tool I've ever used falls into one of two traps.\n\nThe first is the **grep-and-read loop**: you (or your AI agent) search for a name, open ten files, read around the matches, follow an import, search again. It works, but it's slow, it burns tokens, and it has no idea that the `process_order`\n\nyou found in `services.py`\n\nis the *same* `process_order`\n\nthat gets called from `api.py`\n\n— versus the unrelated one in `tests/`\n\n.\n\nThe second is the **single-language silo**: tools that understand Python beautifully but go blind the moment your TypeScript front end calls a Python FastAPI route. Real systems are polyglot. Your tooling usually isn't.\n\n[ graphlens](https://github.com/Neko1313/graphlens) is an open-source (MIT) framework built to escape both traps. It parses a source project, normalizes its structure into a shared\n\n```\nRepository → Language Adapter → GraphLens (IR) → Graph Backend\n```\n\n| Layer | Responsibility |\n|---|---|\nLanguage Adapter |\nParses source files, produces a `GraphLens`\n|\nGraphLens |\nTyped nodes + directed relations — the intermediate representation |\nGraph Backend |\nPersists or queries the graph (Neo4j, in-memory, your own) |\n\nThe key design decision: **adapters are pure data producers.** They never write to a database, never touch the filesystem after reading, never run a server. The graph is the only output. That makes the whole pipeline trivially testable, cacheable, and serializable.\n\n```\npip install \"graphlens-cli[python]\"\ngraphlens analyze ./my-project\ngraphlens · my-project\n  nodes:      1240\n  relations:  3981\n  resolver:   ok\n\nnodes by kind        relations by kind\n  FUNCTION    410       CONTAINS    980\n  METHOD      265       DECLARES    870\n  CLASS        98       CALLS       640\n  MODULE       54       REFERENCES  410\n```\n\nOr from Python:\n\n``` python\nfrom pathlib import Path\nfrom graphlens import adapter_registry\n\nadapter = adapter_registry.load(\"python\")()\ngraph = adapter.analyze(Path(\"./my-project\"))\n\nprint(len(graph.nodes), \"nodes,\", len(graph.relations), \"relations\")\n\nfn = graph.nodes_by_name(\"process_order\")[0]\nprint(\"called by:\", [n.name for n in graph.callers(fn.id)])\n```\n\nMost lightweight code-graph tools resolve references by name: see a call to `save()`\n\n, draw an edge to anything called `save`\n\n. That's fast and wrong — there are usually a dozen `save`\n\ns in a codebase.\n\ngraphlens splits the work in two:\n\n`definition_at(file, line, col)`\n\nfor each occurrence. The resolved definition becomes a real edge to the | Language | Resolver | Engine |\n|---|---|---|\n| Python | `TyResolver` |\n`ty` |\n\n`TsResolver`\n\n`GoplsResolver`\n\n`gopls`\n\n`RustAnalyzerResolver`\n\n`rust-analyzer`\n\nSo a `CALLS`\n\nedge points at the real function, a `HAS_TYPE`\n\nedge at the real class, an `INHERITS_FROM`\n\nedge at the real base. This is the difference between \"probably related\" and \"is related\".\n\nType analysis can degrade — a toolchain is missing, a file doesn't type-check. Instead of silently producing a half-resolved graph, graphlens records the outcome:\n\n``` python\nfrom graphlens import RESOLVER_STATUS_KEY\ngraph.metadata[RESOLVER_STATUS_KEY]   # 'ok' | 'degraded' | 'unavailable'\n```\n\nIn CI you flip on `--strict`\n\nand a non-`ok`\n\nstatus fails the build, so an agent or dashboard never consumes a graph that's quietly incomplete.\n\n**Nodes** (`PROJECT`\n\n, `MODULE`\n\n, `FILE`\n\n, `CLASS`\n\n, `METHOD`\n\n, `FUNCTION`\n\n, `PARAMETER`\n\n, `VARIABLE`\n\n, `ATTRIBUTE`\n\n, `TYPE_ALIAS`\n\n, `IMPORT`\n\n, `DEPENDENCY`\n\n, `EXTERNAL_SYMBOL`\n\n, `BOUNDARY`\n\n) are frozen dataclasses with an id, kind, qualified name, file path, span, and free-form metadata.\n\n**Relations** are directed, typed edges:\n\n| Kind | Meaning |\n|---|---|\n`CONTAINS` / `DECLARES`\n|\nstructural containment & declaration |\n`IMPORTS` / `RESOLVES_TO`\n|\nimport statements and where they resolve |\n`CALLS` / `REFERENCES` / `INHERITS_FROM` / `HAS_TYPE`\n|\nresolved, type-aware edges |\n`DEPENDS_ON` |\ndeclared package dependency |\n`EXPOSES` / `CONSUMES` / `COMMUNICATES_WITH`\n|\ncross-language boundaries |\n\nA node's ID is a SHA-256 hash of `project::kind::qualified_name`\n\n:\n\n``` python\nfrom graphlens import make_node_id\nmake_node_id(\"my-project\", \"my.module.func\", \"FUNCTION\")\n# → the same id every scan, on every machine\n```\n\nBecause the ID depends only on identity, not file position, re-scanning yields the same IDs. That's what makes `graph.diff(other)`\n\nand incremental updates work — and what makes a graph cacheable in CI.\n\nThis is my favorite part. Adapters emit language-agnostic ** BOUNDARY** nodes for the interfaces a service exposes or consumes — HTTP routes, queue topics, gRPC methods, Temporal activities — with an\n\n`EXPOSES`\n\nedge (provider) or `CONSUMES`\n\nedge (consumer).A boundary's ID is `make_boundary_id(mechanism, key)`\n\n— *no project or language in it*. HTTP paths are normalized so that `/users/1`\n\n, `/users/{user_id}`\n\n(FastAPI), `<int:id>`\n\n(Flask), and `:id`\n\n(Express) all collapse to `GET /users/{}`\n\n.\n\nThe payoff: a Python FastAPI route and a TypeScript `fetch`\n\nto the same endpoint produce the **same** boundary ID. Merge the two graphs, run `graphlens-link`\n\n, and you get `COMMUNICATES_WITH`\n\nedges spanning the language gap:\n\n``` python\nfrom graphlens import adapter_registry\nfrom graphlens_link import link_graph\n\npy = adapter_registry.load(\"python\")().analyze(python_project)\nts = adapter_registry.load(\"typescript\")().analyze(typescript_project)\n\nmerged = py\nmerged.merge(ts, allow_shared=True)   # identical BOUNDARY nodes coincide\nresult = link_graph(merged)           # adds consumer → provider edges\n\nprint(result.relations_added, \"COMMUNICATES_WITH edges added\")\n```\n\nNow you can answer \"which front-end calls hit this endpoint?\" — a question no single-language tool can even represent.\n\n**As a library** — load an adapter, get a `GraphLens`\n\n, query it: callers, callees, references, neighborhoods, diffs, JSON round-trips, multi-language merges.\n\n**From the CLI** — five subcommands cover the common workflows:\n\n```\ngraphlens analyze ./repo --output graph.json   # index\ngraphlens query process_order -g graph.json --op callers\ngraphlens visualize ./repo                      # interactive vis.js HTML\ngraphlens neo4j ./repo --uri bolt://localhost:7687\ngraphlens mcp --graph graph.json                # serve to agents\n```\n\n**In CI** — `--strict`\n\nplus a Docker image (`ghcr.io/neko1313/graphlens`\n\n) with every adapter and toolchain pre-installed. Index on every push, publish the graph as an artifact, fail on a degraded graph.\n\n**To LLM agents over MCP** — `graphlens mcp`\n\nexposes a saved graph as Model Context Protocol query tools (`stats`\n\n, `find`\n\n, `callers`\n\n, `callees`\n\n, `references`\n\n, `neighbors`\n\n, `boundaries`\n\n, `communicates_with`\n\n). Instead of dumping a codebase into the prompt, the agent asks precise questions and gets small structured answers — resolved edges, not best-effort text search.\n\n**As a Neo4j export** — straight into a graph database with `UNWIND … MERGE`\n\nCypher (no APOC required), then query it however you like.\n\nThe core never imports an adapter. Each language is a separate package that registers itself via Python entry points:\n\n```\n[project.entry-points.\"graphlens.adapters\"]\npython = \"graphlens_python:PythonAdapter\"\n```\n\nCallers resolve adapters through a registry, by name string:\n\n```\nadapter_registry.available()        # ['python', 'typescript', ...]\nadapter = adapter_registry.load(\"python\")()\n```\n\nAdding a new language means writing one package against the `LanguageAdapter`\n\ncontract — no changes to the core.\n\nThe scope is deliberately narrow, and the docs spell it out. graphlens produces a graph IR and stops there. It does **not**:\n\n`visualize`\n\nemits static HTML, `mcp`\n\nexposes query tools — neither hosts a long-running service).Those belong to tools built *on top of* graphlens. Keeping the core minimal is what keeps it composable.\n\nThroughput on real-world projects, refreshed on every release inside the published Docker image (single cold run, indicative):\n\n| Project | Lang | LOC | Nodes | Time | Resolved |\n|---|---|---|---|---|---|\n| apache/superset | python | 399 519 | 156 251 | 148.7s | 84% |\n| colinhacks/zod | typescript | 74 194 | 8 741 | 19.0s | 91% |\n| gin-gonic/gin | go | 23 672 | 7 227 | 13.9s | 100% |\n| gohugoio/hugo | go | 224 821 | 34 809 | 112.7s | 99% |\n| BurntSushi/ripgrep | rust | 50 275 | 9 612 | 113.1s | 99% |\n\n```\npip install \"graphlens-cli[python]\"\ngraphlens analyze . --output graph.json\ngraphlens visualize .\n```\n\n`ty`\n\n) and TypeScript (Node) toolchains install on demand; Go and Rust adapters come via the Docker image.If you've ever wanted a single, accurate, language-agnostic model of \"how does this codebase actually fit together\" — that's exactly what graphlens hands you. I'd love feedback, issues, and adapter contributions.", "url": "https://wpnews.pro/news/graphlens-a-polyglot-code-analysis-framework-that-turns-your-repo-into-a-typed", "canonical_source": "https://dev.to/neko1313_4/graphlens-a-polyglot-code-analysis-framework-that-turns-your-repo-into-a-typed-graph-4mhi", "published_at": "2026-06-22 01:19:45+00:00", "updated_at": "2026-06-22 01:39:48.960401+00:00", "lang": "en", "topics": ["developer-tools"], "entities": ["graphlens", "Neo4j", "Python", "TypeScript", "FastAPI"], "alternates": {"html": "https://wpnews.pro/news/graphlens-a-polyglot-code-analysis-framework-that-turns-your-repo-into-a-typed", "markdown": "https://wpnews.pro/news/graphlens-a-polyglot-code-analysis-framework-that-turns-your-repo-into-a-typed.md", "text": "https://wpnews.pro/news/graphlens-a-polyglot-code-analysis-framework-that-turns-your-repo-into-a-typed.txt", "jsonld": "https://wpnews.pro/news/graphlens-a-polyglot-code-analysis-framework-that-turns-your-repo-into-a-typed.jsonld"}}