How to Build a Self-Hosted AI Code Review Tool in Python

This article provides a guide for building a self-hosted AI code review tool in Python using Ollama and a locally-run language model like CodeLlama or DeepSeek-Coder. The tool reads a git diff, sends it to the local model for analysis, and returns structured review comments that can be integrated into CI workflows. By running entirely on local infrastructure, it ensures that proprietary or sensitive source code never leaves the user's own perimeter.

Every team has the same code review problem: PRs sit for days, reviewers miss subtle logic bugs, and security issues slip through because nobody carefully checked the authentication layer. Linters catch syntax and style issues, but they don't reason about intent. A language model can — and you can run it entirely on your own infrastructure without sending a single line of your source code to a third party. This guide walks you through building a self-hosted AI code review tool in Python. It reads a git diff, sends it to a locally hosted language model, and returns structured review comments you can pipe directly into your CI workflow. Why Self-Hosted Matters Sending your source code to an external API is a significant trust decision. For proprietary code, regulated industries, or anything security-sensitive, you want model inference happening inside your own perimeter. Ollama handles this cleanly: it runs any GGUF-quantized model locally and exposes an HTTP endpoint that's fully compatible with the OpenAI Python SDK. You get the same API surface, zero data egress. The architecture is intentionally simple: - A Python script reads a git diff or file path - It splits the diff into manageable chunks - Each chunk is sent to the local LLM with a structured system prompt - The model returns JSON-formatted review comments - You aggregate and display them — or feed them into your CI gate Setting Up You need Python 3.11+, the openai SDK it works against any compatible endpoint , and Ollama running locally with a code-focused model. codellama:13b works well; deepseek-coder:6.7b is faster and nearly as accurate for review tasks. pip install openai gitpython ollama pull deepseek-coder:6.7b Store your config in a .env file — the script reads from environment variables so swapping models requires no code changes: OLLAMA BASE URL=http://localhost:11434/v1 OLLAMA API KEY=ollama OLLAMA MODEL=deepseek-coder:6.7b The Core Reviewer The script reads a diff from a file argument or stdin which makes it trivial to wire into a git hook , sends it to the model, and parses the structured output. python import os, json, sys from openai import OpenAI client = OpenAI base url=os.getenv "OLLAMA BASE URL", "http://localhost:11434/v1" , api key=os.getenv "OLLAMA API KEY", "ollama" , MODEL = os.getenv "OLLAMA MODEL", "deepseek-coder:6.7b" SYSTEM PROMPT = "You are a senior software engineer performing a code review.\n" "Analyze the provided code diff and return a JSON array of review comments.\n" "Each comment must have: severity critical/warning/suggestion , " "line int or null , message str , fix str or null .\n" "Return ONLY valid JSON. No prose outside the JSON array." def review diff diff text: str, max chunk chars: int = 6000 - list dict : lines = diff text.splitlines keepends=True chunks, current, current len = , , 0 for line in lines: if current len + len line max chunk chars and current: chunks.append "".join current current, current len = , 0 current.append line current len += len line if current: chunks.append "".join current all comments = for chunk in chunks: response = client.chat.completions.create model=MODEL, messages= {"role": "system", "content": SYSTEM PROMPT}, {"role": "user", "content": f"Review this diff:\n\n{chunk}"}, , temperature=0.1, max tokens=1024, raw = response.choices 0 .message.content.strip try: comments = json.loads raw if isinstance comments, list : all comments.extend comments except json.JSONDecodeError: pass return all comments if name == " main ": diff = open sys.argv 1 .read if len sys.argv 1 else sys.stdin.read comments = review diff diff has critical = False for c in sorted comments, key=lambda x: "critical","warning","suggestion" .index x.get "severity","suggestion" : print f" {c.get 'severity','?' .upper } line {c.get 'line','?' }: {c.get 'message','' }" if c.get "fix" : print f" → {c 'fix' }\n" if c.get "severity" == "critical": has critical = True sys.exit 1 if has critical else 0 The script exits with code 1 if any critical issue is found, making it trivial to use as a blocking pre-push hook or CI gate. Integrating into CI For GitHub Actions, run the reviewer on every pull request diff: name: AI Code Review on: pull request jobs: review: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 - uses: actions/setup-python@v5 with: python-version: "3.12" - run: pip install openai - name: Run AI review env: OLLAMA BASE URL: ${{ secrets.OLLAMA BASE URL }} OLLAMA API KEY: ${{ secrets.OLLAMA API KEY }} OLLAMA MODEL: deepseek-coder:6.7b run: | git diff origin/${{ github.base ref }}...HEAD pr.diff python reviewer.py pr.diff For self-hosted CI Gitea Actions, GitLab CI, Jenkins , point OLLAMA BASE URL at your internal Ollama instance. The runner needs network access to it, but nothing leaves your perimeter. If your Ollama node lives on a private subnet, use a dedicated runner in that subnet rather than routing through a proxy. Hardening the Prompt for Security Review The default prompt covers general code quality. When you want security-focused output — useful as a pre-merge gate on sensitive services — specialize the system prompt: SECURITY PROMPT = "You are a security-focused code reviewer.\n" "Flag only security vulnerabilities: injection flaws, auth bypasses, " "insecure deserialization, hardcoded credentials, missing input validation, " "race conditions, and OWASP Top 10 patterns.\n" "Return a JSON array: {severity, cwe, line, message, fix} . " "Return ONLY valid JSON." Swap this in for SYSTEM PROMPT . The cwe field is useful if you want to integrate findings with a vulnerability tracker or feed them into a risk scoring pipeline. Keep in mind that language models produce false positives at a non-trivial rate. Treat this layer as a fast first-pass triage, not a substitute for manual review. For a structured view of what to actually check before shipping to production, our security hardening checklists https://ayinedjimi-consultants.fr/checklists cover the most common vulnerability classes by language and framework. Splitting Large Diffs by File Chunking by character count works, but it can split a file mid-hunk and confuse the model. Splitting by file boundary gives better results: php import re def split diff by file diff text: str - list str : parts = re.split r' ?=^diff --git ', diff text, flags=re.MULTILINE return p for p in parts if p.strip def review all files diff text: str - list dict : all comments = for file diff in split diff by file diff text : all comments.extend review diff file diff return all comments For very large files 300+ changed lines , further split on the @@ hunk markers. The model's effective context for code analysis degrades past ~4000 tokens of diff — smaller, focused chunks consistently produce better output than one large dump. The Takeaway Self-hosted AI code review earns its place in the pipeline as a fast, cheap first-pass filter. It catches common patterns — missing error handling, SQL queries built with f-strings, hardcoded secrets, unvalidated user input — before a human reviewer ever opens the PR. The setup is lightweight: Ollama, one Python file, a CI step. What it won't replace: architectural review, business logic validation, and nuanced security analysis that requires understanding your domain. The model doesn't know your codebase's invariants or threat model. But for the low-hanging fruit, it consistently earns its keep. From here, you can extend this foundation: add a SQLite store to track comment trends over time, wire up the GitHub Reviews API to post inline comments on the PR diff, or build a prompt library with different reviewer personas security, performance, readability . The pattern is solid — the specialization is up to you. I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. We publish free security hardening checklists — PDF and Excel.