cd /news/large-language-models/rag-for-code-why-chunking-by-functio… · home topics large-language-models article
[ARTICLE · art-45026] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

RAG for Code: Why Chunking by Function Beats Chunking by Lines

A developer built a retrieval-augmented generation (RAG) system for code and found that chunking by function boundaries dramatically outperformed line-based chunking. By using a parser to extract complete functions, methods, and classes, the system retrieved meaningful code units that allowed an LLM to answer questions accurately. The approach improved retrieval quality without changing the underlying model.

read4 min views1 publishedJun 30, 2026

I built a retrieval system over a codebase so an LLM could answer questions about it, and my first version was nearly useless. The problem was not the model or the embeddings. It was how I cut the code into chunks. Splitting source by line count shreds the very structure that makes code meaningful. Here is why function-aware chunking works so much better, and how to do it.

The standard RAG tutorial says: split your documents into fixed-size chunks (say 500 tokens), embed each chunk, retrieve the closest ones to the query. For prose, fine. For code, this is destructive.

A 500-token window does not respect function boundaries. You end up with chunks like "the last third of transfer()

and the first half of approve()

." Neither function is complete. The embedding represents a fragment that means nothing on its own, and when you retrieve it, you hand the model half a function with no signature and no context.

My early system would confidently answer questions about functions it had only seen the middle of. The retrieval was the bottleneck, and the chunking was the cause.

Code has natural units: functions, methods, classes, contracts. Those are the units a developer reasons about, so those are the units to chunk by. One function, one chunk. The chunk includes the full signature, the body, and ideally the doc comment above it.

interface CodeChunk {
  name: string;        // function or method name
  signature: string;   // full signature for context
  body: string;        // the complete function body
  filePath: string;    // where it lives
  startLine: number;
}

Now each chunk is a complete, meaningful thing. Retrieve it and the model gets a whole function it can reason about, with its name and signature intact.

For Solidity or TypeScript, you can get a long way with a parser rather than regex. For TypeScript I use the compiler API or a tool like ts-morph

; for Solidity, a proper parser that gives you the AST. The point is to walk the syntax tree and emit one chunk per function-level node, rather than slicing the raw text.

A simplified shape of the extractor:

import { Project } from "ts-morph";

function chunkByFunction(filePath: string): CodeChunk[] {
  const project = new Project();
  const source = project.addSourceFileAtPath(filePath);
  const chunks: CodeChunk[] = [];

  for (const fn of source.getFunctions()) {
    chunks.push({
      name: fn.getName() ?? "anonymous",
      signature: fn.getSignature().getDeclaration()?.getText() ?? "",
      body: fn.getText(),          // the whole function, intact
      filePath,
      startLine: fn.getStartLineNumber(),
    });
  }
  // also walk classes/methods the same way
  return chunks;
}

Each function comes out whole. No more half-functions.

I run this entirely on a local model so a private codebase never leaves my machine. Ollama serves an embedding model; I embed each function chunk and store the vectors:

import { Ollama } from "ollama";
const ollama = new Ollama();

async function embed(text: string): Promise<number[]> {
  const r = await ollama.embeddings({ model: "nomic-embed-text", prompt: text });
  return r.embedding;
}

I embed ${chunk.name}\n${chunk.signature}\n${chunk.body}

so the function name and signature are part of the vector, not just the body. That makes name-based queries ("what does withdraw

do") retrieve well, because the name is in the embedded text.

After switching to function chunks, the same questions that used to get fragmented, half-wrong answers got crisp ones. "How does this contract handle reentrancy in withdrawals?" now retrieves the complete withdraw

function plus the modifier it uses, and the model can actually reason about the checks-effects-interactions order because it can see the whole thing.

The model did not get smarter. The retrieval got honest. I was handing it complete units of meaning instead of arbitrary text windows.

One thing I added later: for a retrieved function, I also pull in the one-line signatures of functions that call it. That gives the model a sense of how the function is used without bloating the chunk. It is cheap context that often answers the follow-up question before it is asked.

RAG quality is mostly retrieval quality, and retrieval quality is mostly chunking quality. The instinct to chunk by size comes from text-document tutorials, but code is not prose. It has structure, and that structure is exactly what carries the meaning. Chunk along the structure, embed the name and signature with the body, and run it locally if the code is private. The embeddings and the model were never the problem. The scissors were.

── more in #large-language-models 4 stories · sorted by recency
── more on @ollama 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/rag-for-code-why-chu…] indexed:0 read:4min 2026-06-30 ·