Hi, I'm Ryan, CTO at airCloset.
In Part 1, I wrote about unifying 46 repositories of production code into a single knowledge graph via static analysis. The graph itself got built, but I closed the post with four open issues: no semantic search, node explosion, having to open the file to actually know what a function does, and the cost of writing a new parser every time a new boundary pattern showed up.
This Part 2 is about how I solved the first one β the entry-point problem (no semantic search). The other three are left exactly as Part 1 described them β I'll come back to them at the end, together with the new issues that surfaced once the entry-point problem was out of the way.
The reason to start with the entry-point problem is simple: if the graph exists but the only way to reach it is grep, the model ends up inferring anyway. The whole point β "give the model verified facts, not inference" β falls apart. So the entry-point problem had to be solved before the others.
Months earlier, I'd already solved the same structural problem in a different domain β the db-graph project.
Internally, we had a large number of DB tables spread across many services, and no single person had the full picture. Different people knew different pieces well, but the whole map didn't fit in anyone's head. So I built db-graph: extract schemas statically from ORM definitions, generate per-table descriptions with Gemini, embed them as 768-dimensional vectors in the graph, and make the whole thing semantically searchable in natural language.
At the time of that article it covered 991 tables. Today it spans 21 schemas / 1,133 tables / 10,815 columns, and finding data in natural language without knowing table names is just how people work now.
The pattern that proved out there:
Static-analysis graph + AI-generated context = natural-language semantic search works.
If it worked for db-graph, it should work for code-graph. The moment that thought landed, I noticed something:
code-graph already contains "DB table nodes" as boundary nodes β they're one of the boundary node types I covered in Part 1.
So if I just join code-graph and db-graph, code-graph automatically inherits db-graph's semantic context. Without writing a single annotation, the existing assets alone make the graph meaningfully richer.
That's where the idea of "joining graphs" first came up β not treating each graph as its own island, but designing the joins between them.
Joining db-graph took care of DB context. But the remaining boundaries (API / Event) and the graph's entry-point type (Page) still need meaning attached. Static analysis alone can't pull intent out of those, so context has to come from somewhere else.
The choice was clear: write the intent directly into the code via annotations (the same approach used by cortex's internal knowledge graph, which I covered in AI Harness Series, Part 2).
The catch: you can't annotate all the functions across 46 repos. There must be tens of thousands of them. Asking established teams running an existing production codebase to retroactively annotate everything is just not realistic.
But here's the second realization:
What matters is just the boundary nodes.So if I only annotate around the boundaries, that's enough.
When an AI agent asks "what breaks if I change this code" or "what other repos call this API," what it needs isn't a per-function logic explanation. It needs boundary intent β what is this screen for, what does this API return, what milestone in the business does this Event mark.
= Minimum annotations, maximum meaning. That became the heart of the design.
Putting it together (internally we call this annotation graph service-product-graph, or SPG):
Three graphs sit as peers, joined by SAME_ENTITY edges. There's no hierarchy β you can start from any graph and reach the others.
@graph-*
tags written only around boundariesThe entry point for AI agents is a single MCP server that traverses all three graphs. AI agents never hit db-graph directly β the annotation graph's MCP server proxies db-graph calls on their behalf.
The annotation graph has 7 node types: Page / Section / Dialog / Field / Action / Api / Task. The early version was screen-focused and called screen-graph
, but once it grew to cover backend Api / Task, it was renamed to service-product-graph.
Here's what an annotation looks like (fictional, but close in shape to the real ones):
/**
* @graph-page /home
* @graph-business Main screen. Members can see what they're currently renting, buy items, and initiate returns.
* @graph-label Home Screen
* @graph-has-section banners, wearing-items, wearing-return, delivery-status
* @graph-has-dialog buying-modal, return-modal
* @graph-navigates-to /return-procedure, /checkout, /my-karte
* @graph-calls GET /api/v1/wearing
* @graph-reads admin_delivery_orders, admin_rental_items
* @graph-flow styling-loop
* @graph-status monthly-member
*/
Two things matter here:
@graph-business
@graph-flow
/ @graph-status
There's also @graph-case
(the conditional pattern tag that test cases derive from), but that's for another time.
This is where it gets practical.
Once I committed to building annotation graph, here were the constraints:
In other words: don't mix humans and AI inside the same PR.
The solution was to physically separate annotations onto their own branch.
This is the "every line of code passes through an AI gate" ideal from AI Harness Series, Part 6, adapted to the constraints of an existing organization. cortex (the internal AI platform) is a monorepo I assemble from scratch, so "every commit passes the AI gate" actually holds there. For the 46-repo production system, that precondition doesn't hold. So instead of giving up on the ideal, I split it: engineers' workflow on one branch, AI's annotation workflow on another, both running in parallel.
Just running the annotation pipeline doesn't guarantee the quality of the joins between the three graphs (code-graph / db-graph / annotation graph). So there's a set of SLOs that automatically check the consistency across the entire graph.
The main rules:
HANDLES_API
handlers must have downstream function calls (= no handlers that receive an API and then do nothing)These are really just a naive question β "shouldn't the boundaries connect to each other?" β turned into an SLO. If anything drops below threshold, an alert fires, and the trustworthiness of the whole graph gets defended every day.
The daily boundary-analysis cron from Part 1 (5% connection-rate drop = alert) was code-graph-only. This is a cross-graph SLO β it guards the joins between graphs themselves. Add a parser to one repo, write a new annotation, change a schema β whatever happens, by the next morning a quality drop in any join becomes visible.
I've been writing "join" casually, but the actual joining wasn't that straightforward.
Static-analysis API / Page / Task nodes and annotation graph API / Page / Task nodes are created as separate nodes. They mean the same thing, but their names / paths / identifiers don't match by themselves β there's nothing automatic about lining them up.
To connect them, we generate a separate edge type called SAME_ENTITY. There are three bridges:
/console/api/
to /api/
)/v1.x/
β /
)/:id
, /{id}
to /:dynamic
)?
β strip trailing :dynamic?
β finally fall back to a dynamic-dispatch boundary :dynamic
, loosening progressivelyThere was also one operational footgun. The first implementation used INSERT NOT EXISTS
to avoid duplicates. But BigQuery's streaming-buffer visibility lag let duplicates slip in β in one repo the edges doubled from 106 to 214 overnight. We fixed it by rewriting to MERGE INTO
to make the operation idempotent.
With all of this in place, the entry-point problem from the end of Part 1 was finally solved:
"the subscription-fee calculation for members seems off"
Throw this natural-language query at annotation graph and vector search returns the related nodes (Page / Api / Function / DB table) as facts. From there, SAME_ENTITY takes you over to code-graph functions, including callers and callees in other repos. From the DB boundaries in code-graph, you can cross into db-graph and pull the relevant columns.
The entry point can be anywhere β "what calls this table?" starts from db-graph, "what's the blast radius of this function?" starts from code-graph, both walk the same connected network. From a single natural-language query, or from a specific node, you can now traverse all three graphs and get every relevant piece of code plus every relevant DB schema.
The Part 1 lament β "the graph is there but the entry point is missing" β could finally be put to bed.
From 2026-04-16 (first production deployment) to the time of writing β about 2.5 months β the annotation graph's MCP server has handled ~50,000 calls from ~73 users. The breakdown:
The interesting line is the second one. "Search the codebase in natural language" is usually an engineer's tool β but once the entry-point problem was solved, people outside engineering started using it too, asking things like "how does this feature actually work?" or "what's in this DB?" in their own words.
This is adjacent to the "non-engineers writing specs with AI" trend I covered in AI Harness Series, Part 5 β a graph that can be queried by meaning starts to matter org-wide. Call volume is overwhelmingly dominated by engineers, of course. The interesting thing is the range of job roles starting to pick it up. That's the real impact of solving the entry-point problem.
The MCP server is the cross-graph entry point. It exposes six tools β service search / service detail / API detail / data-flow tracing / impact-radius tracing / business-rule full-text search β and that's the only entry point AI agents ever touch.
One design choice worth calling out: AI agents never talk to db-graph directly. The annotation graph's MCP proxies db-graph calls. From the agent's side, the mental model stays simple: "ask one MCP and get everything back."
That makes the full chain β "Screen β API β Code β DB β Column" β traversable in a single MCP tool call.
Same approach as Part 1 (pulling commits from JanβMar). For Part 2, the key commits are from AprilβMay.
refactor(graph): rename screen-graph to service-product-graph
β declaration that the scope expands from screen-only to whole-servicefeat(graph): add Api and Task node types to service-product-graph parser
β Api / Task node types addedfeat(mcp): add cross-graph tools to service-product-graph MCP
β feat(graph): add SAME_ENTITY bridge edges between service-product-graph and code-graph
β feat(graph): resolve Redis keys to code-graph boundary nodes
β boundary resolution through Redisfeat(service-product-graph): add EventBridge EMITS_TO support + SAME_ENTITY bridge
feat(code-graph, service-product-graph): improve SAME_ENTITY boundary bridge coverage
β 4-stage fallback locked infeat(auto-review): SPG annotation auto-maintenance pipeline
β feat(service-product-graph): add Task SAME_ENTITY bridge to code-graph
β all three bridges in placefeat(spg): add mall repos to SPG indexing
β mall repos indexedfeat(spg): add Go-aware parser
β April 15 was the day "expansion + cross-graph tools + bridges" landed in close succession. Over the next week, "Redis / EventBridge / Task bridges / annotation auto-maintenance" stacked up week over week.
In particular, the annotation auto-maintenance pipeline on April 21 is where the "humans alone can't do this, but AI can" promise from Part 1 got cashed in. From that point on, annotation shifted from "humans grind through writing them" to "design the whole operation assuming AI writes them."
Solving the entry-point problem didn't make everything clean. A few issues remain.
The frontend side is annotated heavily. Backend / Go / batch are still thin. Some nodes will always be missing annotations β that's structural, and you can't drive it to zero. It's an ongoing operational issue.
The Page bridge in particular has cases where multiple annotation Pages map to the same boundary β that's structural and unavoidable. Adding more strategies got coverage to 100%, but guaranteeing "every join is correct" 100% is hard.
The graph only carries the fact that "this edge exists statically." How often that edge actually gets used in production isn't recorded. Piping production execution counts back into the static graph and surfacing dead-code edges as a separate signal β that's still untouched.
Every time a new repo enters production, the bridge normalization rules and per-repo patterns need adjusting. This is the annotation-graph-side version of Part 1's fourth issue (the cost of adding a new parser for every new boundary pattern).
In Part 1's closing note, I touched on the fact that the cortex side (the internal AI platform) bailed out of the code-graph approach early and bet on an annotation-based knowledge graph instead. The bail-out was fast enough that calling it "thrown away" wouldn't be wrong β but looking back across this whole series, the more accurate word is "evolved."
What it evolved into, in the end, is three graphs joined as peers:
Joined by SAME_ENTITY, served to the agent through MCP. The thing static analysis alone couldn't deliver β querying by meaning β became workable by reusing the db-graph success pattern and adding minimal annotations only at the boundaries.
And one more framing: paired with the AI Harness Series, Parts 1β6, this series sits as:
= the same philosophy (design without trusting AI), implemented under two different sets of constraints.
Thanks for reading this far.