Building a Local AI Code Reviewer with Ollama That Catches Bugs Before Your Team

wpnews.pro

Your teammates are busy. Your CI is green but shallow. And the bug you just staged is the kind a second pair of eyes would catch in five seconds. So let's build that second pair of eyes: a small TypeScript CLI that feeds your staged git diff to a local LLM and returns structured findings, before anyone else sees your code. No API key, no cloud, no leaking your private repo to a vendor.

The whole tool is one loop:

git diff --cached

.pre-commit

hook.Everything runs locally against qwen2.5-coder:7b

. You'll need Ollama running (ollama serve

) and the model pulled (ollama pull qwen2.5-coder:7b

).

The reviewer should look at exactly what you're about to commit, nothing more. That's --cached

(staged changes only):

import { execSync } from "node:child_process";

function getStagedDiff(): string {
  return execSync("git diff --cached --no-color -U3", {
    encoding: "utf8",
    maxBuffer: 10 * 1024 * 1024,
  });
}

A few choices that matter:

--no-color

keeps ANSI escape codes out of the prompt.-U3

gives three lines of context around each hunk. Enough for the model to reason, not so much that you blow the context window.maxBuffer

bumps Node's default 1MB cap so big diffs don't throw.If the diff is empty, there's nothing to review:

const diff = getStagedDiff();
if (diff.trim().length === 0) {
  console.log("No staged changes. Stage something first with `git add`.");
  process.exit(0);
}

This is where the quality lives. A vague prompt gives you vague, hallucinated nitpicks. Be specific about what counts as a finding, and what to ignore.

const SYSTEM_PROMPT = `You are a senior code reviewer. You review git diffs for bugs only.

Focus on:
- Logic errors (off-by-one, inverted conditions, wrong operators)
- Null/undefined access and unhandled error cases
- Resource leaks (unclosed handles, missing awaits)
- Security issues (injection, hardcoded secrets, unsafe input)

Do NOT report:
- Style, formatting, or naming preferences
- Suggestions to add comments or tests
- Anything you are not confident is an actual bug

Lines starting with "+" are added. Lines starting with "-" are removed.
Only review added ("+") lines. Respond with ONLY valid JSON.`;

The "do NOT report" block is doing heavy lifting. Small models love to pad output with "consider adding a comment here." Telling them what to suppress is more effective than telling them what to find.

The instruction to only review +

lines matters too. Without it, the model will happily flag a bug in code you just deleted, which is both useless and confusing. Diffs are a strange dialect to a model trained mostly on whole files, so being explicit about what the +

and -

prefixes mean pays off in fewer nonsense findings.

Ollama speaks the OpenAI-compatible API at localhost:11434

. Spell out the exact schema in the prompt and set temperature: 0

so the output is deterministic:

const RESPONSE_SCHEMA = `Respond with this exact JSON shape:
{
  "findings": [
    {
      "severity": "high" | "medium" | "low",
      "file": "string",
      "line": "string (the code snippet or line reference)",
      "issue": "string (one sentence: what is wrong)",
      "fix": "string (one sentence: how to fix it)"
    }
  ]
}
If there are no bugs, return { "findings": [] }.`;

async function reviewDiff(diff: string, model: string): Promise<unknown> {
  const response = await fetch("http://localhost:11434/v1/chat/completions", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model,
      messages: [
        { role: "system", content: `${SYSTEM_PROMPT}\n\n${RESPONSE_SCHEMA}` },
        { role: "user", content: `Review this diff:\n\n${diff}` },
      ],
      temperature: 0,
      response_format: { type: "json_object" },
      stream: false,
    }),
  });

  if (!response.ok) {
    throw new Error(`Ollama returned ${response.status}. Is \` ollama serve\` running?`);
  }

  const data = await response.json();
  return JSON.parse(data.choices[0].message.content);
}

response_format: { type: "json_object" }

nudges Ollama into JSON mode, which cuts down on the "Here's your review:" preamble that breaks JSON.parse

. It isn't a guarantee, though, which is why the next step exists.

Never trust raw model output. A 1.5b model will occasionally hand you a string where you expected an array, or invent a severity level. Parse it at the boundary and fail loudly if it's malformed:

import { z } from "zod";

const FindingSchema = z.object({
  severity: z.enum(["high", "medium", "low"]),
  file: z.string(),
  line: z.string(),
  issue: z.string(),
  fix: z.string(),
});

const ReviewSchema = z.object({
  findings: z.array(FindingSchema),
});

type Review = z.infer<typeof ReviewSchema>;

function parseReview(raw: unknown): Review {
  const result = ReviewSchema.safeParse(raw);
  if (!result.success) {
    throw new Error(`Model returned invalid review JSON:\n${result.error.message}`);
  }
  return result.data;
}

safeParse

over parse

so you can give a useful error instead of an unhandled throw. When this fires, it's almost always the model wandering off-schema, and the fix is usually a smaller diff or a bigger model.

Make the output scannable. A reviewer nobody reads is useless:

function printReview(review: Review): number {
  if (review.findings.length === 0) {
    console.log("Local review passed. No bugs found.");
    return 0;
  }

  const icon = { high: "[HIGH]", medium: "[MED] ", low: "[LOW] " };
  let hasHigh = false;

  for (const f of review.findings) {
    if (f.severity === "high") hasHigh = true;
    console.log(`\n${icon[f.severity]} ${f.file}`);
    console.log(`  where: ${f.line}`);
    console.log(`  issue: ${f.issue}`);
    console.log(`  fix:   ${f.fix}`);
  }

  console.log(`\n${review.findings.length} finding(s).`);
  return hasHigh ? 1 : 0;
}

Notice the exit code: only high

severity blocks the commit. Medium and low get printed as a heads-up but don't stand in your way. Tune that threshold to your team's tolerance.

async function main() {
  const model = process.argv[2] ?? "qwen2.5-coder:7b";
  const diff = getStagedDiff();

  if (diff.trim().length === 0) {
    console.log("No staged changes.");
    process.exit(0);
  }

  try {
    const raw = await reviewDiff(diff, model);
    const review = parseReview(raw);
    process.exit(printReview(review));
  } catch (err) {
    console.error(`Review failed: ${(err as Error).message}`);
    // Don't block commits on tooling failure. Warn and pass.
    process.exit(0);
  }
}

main();

The catch

is deliberate: if Ollama is down or the JSON is garbage, you log it and let the commit through. A review tool that hard-blocks commits when it itself breaks is a tool people will rip out by Friday.

Build the CLI, then drop a hook into .git/hooks/pre-commit

:

#!/usr/bin/env bash
set -euo pipefail

echo "Running local AI review..."
node /path/to/review.js qwen2.5-coder:7b
chmod +x .git/hooks/pre-commit

For a hook the whole team shares, use husky instead so it lives in the repo. Either way, every git commit

now runs the diff past a local model first. Need to skip it for a quick WIP commit? git commit --no-verify

.

One thing to watch: the first call after the model loads into memory is slow, often several seconds on CPU. That's Ollama paging the weights in, not your code being slow. Keep ollama serve

running in the background and subsequent commits feel near-instant. If you commit rarely enough that the model unloads between commits, that cold start is the price you pay each time.

This is the part most tutorials skip. A local qwen2.5-coder:7b

is not a staff engineer. Here's the realistic picture:

Bug type	1.5b	7b	Notes
Null/undefined access	Decent	Good	The model's bread and butter
Inverted conditions / wrong operator	Spotty	Decent	Needs enough context (`-U3` helps)
Missing `await`

Decent	Good	Easy pattern to catch
Subtle race conditions	Misses	Misses	Needs cross-file context it lacks
Logic spanning multiple files	Misses	Misses	A diff is a keyhole, not the room
False positives	Frequent	Occasional	The main cost of running local

Two failure modes dominate: it invents bugs that aren't there (false positives), and it misses anything that requires understanding code outside the diff. Here's how to keep it useful anyway:

temperature: 0

always.7b

for the hook, 1.5b

for fast local iteration.1.5b

is ~1GB and quick, but its false-positive rate makes it annoying as a gate. Save it for --dry-run

style checks.high

only.A local AI reviewer won't replace your team, and it shouldn't try to. What it does well is catch the careless, three-in-the-afternoon bugs before they reach a pull request: the missing await

, the !

you meant to delete, the unhandled null

. It runs free, it runs private, and it runs every time you commit.

I built the same Claude-plus-Ollama pattern at a larger scale in spectr-ai, an AI smart contract auditor where --model ollama:qwen2.5-coder:1.5b

runs the entire audit locally with no API key. The diff-reviewer here is the same idea shrunk to fit in a git hook. Steal it, scope your diffs, and let the small model earn its keep.

source & further reading

dev.to — original article Mastering Claude Code Configs: `CLAUDE.md` vs `.claude/rules/` How a Baseten Engineer Traced 7 Years of Attention Mechanism Evolution -- From GPT-2 to Kimi K3, in Runable PyTorch Why NVIDIA Open-Sourced Its Linux GPU Kernel Modules

Building a Local AI Code Reviewer with Ollama That Catches Bugs Before Your Team

Run your AI side-project on zahid.host