# How to Use Graphify to Build a Queryable Knowledge Graph for Your AI Agent

> Source: <https://www.mindstudio.ai/blog/graphify-knowledge-graph-ai-agent/>
> Published: 2026-06-25 00:00:00+00:00

# How to Use Graphify to Build a Queryable Knowledge Graph for Your AI Agent

Graphify turns codebases and notes into a queryable knowledge graph, saving tokens and giving your agent persistent memory. Here's how to install and use it.

## Why Raw Files Are Breaking Your AI Agent

When you feed an AI agent a large codebase or a pile of notes, it faces an immediate problem: context windows have limits, and stuffing every file into a prompt is expensive, slow, and often inaccurate.

A queryable knowledge graph solves this. Instead of loading raw files, your agent queries a structured graph of entities and relationships — getting exactly the information it needs, nothing more. That’s what Graphify does, and it’s one of the most practical approaches to giving AI agents persistent, efficient memory.

This guide walks you through installing Graphify, parsing a codebase or notes collection into a knowledge graph, and wiring that graph into your AI agent so it can query it intelligently.

## What Is Graphify and What Problem Does It Solve

Graphify is a tool that ingests source material — code repositories, markdown notes, documentation — and converts it into a graph database where entities (files, functions, classes, concepts) become nodes and their relationships (imports, references, contains, links-to) become edges.

The result is a structured, queryable store that behaves very differently from a flat vector database.

### Knowledge Graphs vs. Vector Search

Most developers default to vector embeddings for RAG (retrieval-augmented generation). Vector search is useful, but it has a fundamental limitation: it finds semantically similar chunks, not structurally related ones.

### Everyone else built a construction worker.

We built the contractor.

One file at a time.

UI, API, database, deploy.

If your agent needs to know which functions call a specific API endpoint, or which notes reference a specific concept across three different documents, vector search gives you fuzzy approximations. A knowledge graph gives you precise, traversable answers.

| Feature | Vector Search | Knowledge Graph |
|---|---|---|
| Query type | Semantic similarity | Structural/relational |
| Relationship traversal | No | Yes |
| Token cost per query | High (returns chunks) | Low (returns targeted data) |
| Precision | Approximate | Exact |
| Best for | General Q&A | Code navigation, dependency mapping |

For codebases especially, a knowledge graph is the right tool. You want to know the actual call graph, not a semantically similar one.

### Where Graphify Fits

Graphify sits between your raw source material and your AI agent. It:

- Parses files and extracts entities and relationships
- Stores them in a graph database (typically Neo4j or an in-memory graph)
- Exposes a query interface your agent can call
- Returns structured, minimal responses — drastically fewer tokens than raw file dumps

This creates something close to persistent memory for your agent: a structured representation of your codebase or notes that survives between sessions and can be queried on demand.

## Prerequisites Before You Start

Before installing Graphify, make sure you have the following in place.

**Runtime requirements:**

- Node.js 18+ (or Python 3.10+ depending on which Graphify implementation you use)
- npm or pip, depending on your environment
- A graph database backend (Neo4j Community Edition is free and works well)

**Optional but recommended:**

- Docker, for running Neo4j without a local install
- An API key for your preferred LLM (GPT-4, Claude, etc.) if you want the agent to interpret graph query results
- Git, if you’re pointing Graphify at a code repository

**Knowledge you should have:**

- Basic command line familiarity
- Understanding of how your AI agent framework handles tool calls or function calling

If you’re not using a graph database backend, some Graphify setups support lightweight in-memory graphs using libraries like `graphology`

(JavaScript) or `networkx`

(Python). These work fine for smaller projects but don’t persist between runs.

## Step 1: Install Graphify and Set Up Your Graph Backend

### Install the Package

For a Node.js environment:

```
npm install graphify-core
```

For a Python environment:

```
pip install graphify-agent
```

The package names vary depending on the specific Graphify distribution you’re using. Check the repository for the canonical package name — there are a few forks and related projects in this space.

### Start Neo4j (Recommended Backend)

The quickest way to get Neo4j running is via Docker:

```
docker run \
  --name graphify-neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/yourpassword \
  neo4j:latest
```

Once it starts, the Neo4j browser is available at `http://localhost:7474`

. You don’t need to do anything in the UI — Graphify will handle schema creation.

### Configure Your Connection

Create a `.env`

file in your project root:

```
GRAPH_BACKEND=neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=yourpassword
```

If you’re using an in-memory backend for testing, set `GRAPH_BACKEND=memory`

— no database required.

## Step 2: Parse Your Source Material Into a Graph

This is where Graphify earns its keep. You point it at a directory and it walks the files, extracts entities and relationships, and writes them to your graph backend.

### Parsing a Codebase

``` js
import { Graphify } from 'graphify-core';

const graph = new Graphify({
  backend: 'neo4j',
  connectionString: process.env.NEO4J_URI,
  auth: {
    user: process.env.NEO4J_USER,
    password: process.env.NEO4J_PASSWORD
  }
});

await graph.parse({
  source: './src',
  type: 'codebase',
  language: 'javascript', // or 'typescript', 'python', etc.
  include: ['**/*.js', '**/*.ts'],
  exclude: ['node_modules/**', 'dist/**']
});

console.log('Graph built successfully');
```

What happens during parsing:

**Files** become nodes with type`File`

**Functions and classes** become nodes with their own types**Import statements** become`IMPORTS`

edges between files**Function calls** become`CALLS`

edges between functions**Class inheritance** becomes`EXTENDS`

edges

The exact node/edge schema depends on the language parser Graphify uses. Most implementations support JavaScript, TypeScript, and Python out of the box.

### Parsing Markdown Notes

Knowledge graphs aren’t just for code. If you have a notes collection — Obsidian vaults, Notion exports, documentation — you can parse those too.

```
await graph.parse({
  source: './notes',
  type: 'markdown',
  include: ['**/*.md'],
  extractLinks: true,       // follow [[wikilinks]] and [markdown](links)
  extractHeadings: true,    // create nodes for H2/H3 headings
  extractEntities: true,    // use NER to extract people, places, concepts
});
```

With `extractEntities: true`

, Graphify uses a lightweight NLP pass to pull named entities out of your notes and create nodes for them. Two notes that both mention “payment gateway integration” will both have edges pointing to a `Concept`

node for that phrase.

This is where the queryable knowledge graph becomes genuinely powerful for note-taking workflows — you can ask “which notes discuss the payment gateway?” and get a precise answer, not a fuzzy vector match.

### Incremental Updates

You don’t need to re-parse the entire source every time. Graphify tracks file hashes and only reprocesses files that have changed:

```
await graph.sync({
  source: './src',
  type: 'codebase',
  language: 'typescript'
});
```

Run this on a schedule or as part of a CI pipeline to keep the graph current.

## Step 3: Query the Graph From Your AI Agent

Once the graph is built, your agent needs a way to query it. Graphify exposes a query interface that accepts either structured graph queries or natural language (which it translates to Cypher or Gremlin internally).

### Direct Graph Queries

For precise structural queries:

``` js
const result = await graph.query({
  type: 'cypher',
  query: `
    MATCH (fn:Function)-[:CALLS]->(dep:Function)
    WHERE fn.name = 'processPayment'
    RETURN dep.name, dep.filePath
  `
});
```

This returns every function that `processPayment`

calls, with file paths — exactly what an agent needs to understand the call graph around a piece of code.

### Natural Language Queries

For agent workflows where you want the LLM to generate queries dynamically:

``` python
const result = await graph.queryNL({
  question: 'Which files import the authentication module?',
  model: 'gpt-4o' // Used to translate NL to graph query
});
```

Graphify sends the schema and the question to the LLM, receives a Cypher query back, executes it, and returns the result. This adds a small latency cost but makes the interface much easier to use from within an agent loop.

### Exposing Queries as Agent Tools

The most effective pattern is to wrap graph queries as tool definitions your agent can call. Here’s an example for an OpenAI-compatible function calling setup:

``` js
const tools = [
  {
    type: 'function',
    function: {
      name: 'query_codebase_graph',
      description: 'Query the codebase knowledge graph to find relationships between files, functions, and classes.',
      parameters: {
        type: 'object',
        properties: {
          question: {
            type: 'string',
            description: 'A natural language question about the codebase structure or dependencies.'
          }
        },
        required: ['question']
      }
    }
  }
];

// In your tool execution handler:
async function executeTool(name, args) {
  if (name === 'query_codebase_graph') {
    return await graph.queryNL({ question: args.question });
  }
}
```

Now your agent can autonomously decide when to consult the knowledge graph, what to ask, and how to use the result — without you hardcoding which files it should look at.

## Step 4: Manage Token Usage and Result Size

One of the main benefits of a knowledge graph is token savings, but you need to be deliberate about what your queries return.

### Limit Result Depth

Graph traversals can return enormous amounts of data if you’re not careful. Use depth limits:

``` js
const result = await graph.query({
  type: 'cypher',
  query: `
    MATCH path = (start:File {name: 'api.ts'})-[:IMPORTS*1..3]->(dep:File)
    RETURN [node IN nodes(path) | node.name] as chain
  `
});
```

The `*1..3`

limits traversal to 3 hops. Without this, a query on a large codebase could return thousands of nodes.

### Return Only What You Need

Avoid `RETURN *`

in production. Be explicit:

```
// Bad: returns all node properties
RETURN node

// Good: returns only what the agent needs
RETURN node.name, node.filePath, node.description
```

This alone can cut token usage by 60–80% compared to returning raw file contents.

### Cache Frequent Queries

If your agent repeatedly asks the same structural questions, cache the results:

``` js
const cache = new Map();

async function cachedQuery(question) {
  if (cache.has(question)) return cache.get(question);
  const result = await graph.queryNL({ question });
  cache.set(question, result);
  return result;
}
```

For long-running agents, a Redis cache with a TTL tied to your sync schedule works well.

## Step 5: Add the Graph to an Automated Workflow

A knowledge graph really pays off when it’s part of an automated workflow — not just a tool you invoke manually.

Common automated patterns:

**Code review agent:** Parse the diff of a pull request, query the knowledge graph to understand which modules are affected by the changed functions, and generate a targeted review.

**Documentation agent:** When a function changes, query the graph to find all docs that reference it, then flag or update them automatically.

**Onboarding agent:** Answer questions from new developers about where things live in the codebase, drawing answers from the graph rather than raw files.

**Note summarization agent:** Across a large notes collection, query for all notes connected to a concept, then synthesize a summary.

Each of these follows the same pattern: trigger → graph query → LLM reasoning → output.

## How MindStudio Connects to This

If you want to run these workflows without managing infrastructure yourself, [MindStudio](https://mindstudio.ai) gives you a no-code environment to build agents that call external tools — including a Graphify knowledge graph.

The most direct integration point is MindStudio’s Agent Skills Plugin (`@mindstudio-ai/agent`

), which lets any AI agent call MindStudio capabilities as typed method calls. But you can also build the full workflow inside MindStudio’s visual builder:

**Set up a webhook trigger**— fire when a PR is opened, a file is saved, or a schedule runs** Add a custom function block**— call your Graphify query endpoint via HTTP** Feed the result to an LLM block**— choose from 200+ models available natively, no separate API keys needed** Route the output**— post to Slack, update a Notion doc, send an email, or write back to your repo

The advantage here is that MindStudio handles the orchestration layer — retries, authentication, multi-step logic — while Graphify handles the knowledge layer. You’re not writing glue code; you’re connecting blocks.

For teams who want the knowledge graph querying behavior without building a custom agent framework, MindStudio’s visual builder can have a working workflow in under an hour. You can [try MindStudio free at mindstudio.ai](https://mindstudio.ai).

## Troubleshooting Common Issues

### The parser hangs on large repositories

Large monorepos can overwhelm the parser if you don’t exclude irrelevant directories. Always exclude `node_modules`

, `dist`

, `build`

, `.git`

, and test fixture directories. Use the `maxFileSize`

option to skip binary or generated files:

```
await graph.parse({
  source: './src',
  type: 'codebase',
  exclude: ['node_modules/**', 'dist/**', '**/*.test.ts'],
  maxFileSize: 500000 // bytes, skip files over 500KB
});
```

### Natural language queries return wrong results

The NL-to-query translation step depends heavily on having a good schema description. Make sure your Graphify config includes human-readable labels for your node and edge types. If the schema is sparse or uses cryptic abbreviations, the LLM will generate poor queries.

### Neo4j runs out of memory

For large codebases, increase Neo4j’s heap allocation in your Docker command:

```
-e NEO4J_server_memory_heap_initial__size=1G \
-e NEO4J_server_memory_heap_max__size=4G
```

### Queries are slow

Check that you have indexes on frequently queried properties. Graphify should create these automatically, but verify with:

```
SHOW INDEXES
```

If indexes are missing on `File.name`

or `Function.name`

, create them manually:

```
CREATE INDEX file_name FOR (f:File) ON (f.name)
CREATE INDEX function_name FOR (fn:Function) ON (fn.name)
```

## Frequently Asked Questions

### What is a knowledge graph in the context of AI agents?

A knowledge graph is a structured representation of entities and the relationships between them. For AI agents, it acts as a persistent, queryable memory store. Instead of the agent loading raw files or chunks of text, it queries the graph for specific nodes and edges — getting precise answers with minimal token usage. The [concept of knowledge graphs](https://en.wikipedia.org/wiki/Knowledge_graph) predates AI agents by decades, but they’ve become especially useful as a complement to LLMs.

### How is a knowledge graph different from a vector database?

Vector databases store embeddings and return results based on semantic similarity. They’re good for “find me content similar to this.” Knowledge graphs store structured relationships and return results based on graph traversal. They’re good for “show me everything connected to this specific node in this specific way.” Most production agents benefit from both: vector search for broad semantic retrieval, graph queries for precise structural lookups.

### Does Graphify work with private codebases?

Yes. Graphify runs entirely locally (or on your own infrastructure). Nothing is sent to external services during the parsing step. If you use the `queryNL`

feature, a schema description and your natural language question are sent to your chosen LLM provider — but not your actual code. You can also write raw Cypher queries to avoid any external API calls entirely.

### How much does Graphify reduce token usage?

## Remy is new. The platform isn't.

Remy is the latest expression of years of platform work. Not a hastily wrapped LLM.

It depends heavily on your use case, but returning graph query results instead of raw file contents typically reduces tokens per query by 70–90%. A function’s call graph returned as a structured list might use 200 tokens; the raw files containing those functions might use 20,000. The savings compound across an agent’s full reasoning loop.

### Can I use Graphify with any AI agent framework?

Graphify exposes a standard HTTP API and JavaScript/Python SDK, so it works with any framework that supports tool/function calling: LangChain, LlamaIndex, CrewAI, AutoGen, Claude Code, and custom implementations. The key is wrapping your Graphify queries as tool definitions that your agent framework understands.

### What graph databases does Graphify support?

Most Graphify implementations support Neo4j as the primary backend, with in-memory options (using `graphology`

or `networkx`

) for development and testing. Some forks add support for ArangoDB and Amazon Neptune. For production workloads with large codebases, Neo4j is the most mature and well-supported option.

## Key Takeaways

**Knowledge graphs give AI agents precise, structured memory**— not fuzzy semantic matches, but exact relationships between entities in your codebase or notes** Graphify handles the parsing and storage layer**so you don’t have to build a custom extractor for every file type** Token savings are significant**— returning graph results instead of raw files typically cuts per-query token usage by 70–90%** The agent tool pattern is the most flexible integration**— wrap your graph queries as tool definitions and let the agent decide when and how to use them** Incremental syncing keeps the graph current**without requiring full re-parses on every run** MindStudio’s visual builder**lets you orchestrate agents that query Graphify endpoints without managing the workflow infrastructure yourself — try it at[mindstudio.ai](https://mindstudio.ai)
