cd /news/large-language-models/i-tried-to-build-an-ai-code-reviewer… · home topics large-language-models article
[ARTICLE · art-21029] src=dev.to pub= topic=large-language-models verified=true sentiment=· neutral

I Tried to Build an AI Code Reviewer Without Sharing My Code — Here's What Worked

A developer built an AI code reviewer that analyzes pull requests without sending proprietary code to third-party services. After failing with local LLMs and finding existing solutions too expensive or complex, the engineer created a thin Python module that works with any API endpoint supporting the chat completions format. The solution uses structured JSON output and retry logic to produce reliable code reviews from self-hosted or local models.

read4 min publishedJun 4, 2026

A few weeks ago, I was staring down a pull request with 800+ lines of changes. The team was moving fast, and I wanted a quick sanity check on style, potential bugs, and security concerns. My first instinct? Ask ChatGPT. But then the paranoia set in: I'd be pasting proprietary code into a black box. Our legal team would have a heart attack. So I went looking for a way to run AI-powered code review without sending sensitive data to a third party.

I'm not going to pretend this was a smooth ride. I banged my head against local LLMs, tried to wrangle Python scripts, and eventually landed on a pattern that actually works. This article is the one I wish I'd found back then.

My team uses GitHub Copilot for inline suggestions, but it sends code snippets to Microsoft's cloud. For our internal tools and customer-facing code, that's a hard no. I needed a review bot that could:

I didn't want to maintain a Kubernetes cluster of GPUs, but I also didn't want to sign another enterprise agreement.

I downloaded Llama 3 8B via Ollama. It ran on my laptop. I wrote a Python script to feed it a diff and ask for review. Results? Terrible. The model would hallucinate line numbers, ignore the prompt, and sometimes just ramble about "great code!" without any actionable feedback. Plus, I had to manually handle the API and parse text responses. Not reproducible.

LangChain seemed like the obvious answer. I set up a chain with a prompt template and a local model. The setup was clunky, dependencies were heavy, and the structured output parser kept breaking. When it finally worked, the latency was 30+ seconds per review. Not usable for real-time PR checks.

I evaluated a few providers that promised data privacy. Most required a yearly contract, a dedicated endpoint, and a minimum spend. For a solo developer or small team, overkill.

I stepped back and realized the core problem wasn't the model — it was the integration pattern. I needed:

I wrote a thin Python module that does exactly that. It works with any API endpoint that supports the chat completions format. Here's the core of it.

import json
import time
from typing import Optional
import httpx

class AICodeReviewer:
    def __init__(self, api_url: str, api_key: str, model: str = "gpt-4o"):
        self.client = httpx.Client(base_url=api_url, timeout=120.0)
        self.api_key = api_key
        self.model = model

    def review_diff(self, diff: str, max_retries: int = 3) -> dict:
        prompt = f"""
You are a senior code reviewer. Analyze the following diff and return a JSON object with:
- 'summary': a short summary of the changes
- 'issues': an array of objects each with 'line', 'severity' (critical/warning/info), 'message', and optionally 'suggestion'
- 'score': an integer from 1 to 10

Diff:
{diff}
"""
        for attempt in range(max_retries):
            try:
                response = self.client.post(
                    "/v1/chat/completions",
                    headers={"Authorization": f"Bearer {self.api_key}"},
                    json={
                        "model": self.model,
                        "messages": [{"role": "user", "content": prompt}],
                        "response_format": {"type": "json_object"},
                        "max_tokens": 2048
                    }
                )
                response.raise_for_status()
                data = response.json()
                content = data["choices"][0]["message"]["content"]
                return json.loads(content)
            except (httpx.HTTPStatusError, json.JSONDecodeError, KeyError) as e:
                print(f"Attempt {attempt+1} failed: {e}")
                time.sleep(2 ** attempt)
        raise RuntimeError("All retries exhausted")

The response_format

field is key — even smaller local models can produce valid JSON if you prompt them right and request structured output. (Most OpenAI-compatible local backends now support this.)

I tested this with a self-hosted endpoint I set up for internal use (pointed at an API like https://ai.interwestinfo.com/

but you can use any compatible provider). The code doesn't care whether the model lives on a GPU cluster or a Raspberry Pi.

I wrapped the above into a CLI that takes a file or piped diff:

$ cat example.diff | python review.py --endpoint https://my-ai-server.com --api-key $KEY --model code-review-7b

Output:

{
  "summary": "Adds authentication middleware",
  "issues": [
    {
      "line": 34,
      "severity": "critical",
      "message": "Hardcoded JWT secret",
      "suggestion": "Use environment variable"
    },
    {
      "line": 102,
      "severity": "info",
      "message": "Missing error handling for token expiry"
    }
  ],
  "score": 6
}

Now I can pipe any diff into this tool, get structured feedback, and integrate it into a GitHub Action or pre-commit hook.

response_format

or function calling.response_format

already existed.I'm still iterating on this — next up is adding batch review of multiple files and caching results to speed up repeated checks. The whole repository is a single Python file and a Dockerfile. It's not a product, just a tool that solved my itch.

How do you handle AI-powered code review in a privacy-sensitive environment? Are you running local models or trusting a provider? I'd love to hear what's working (or not) for you.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/i-tried-to-build-an-…] indexed:0 read:4min 2026-06-04 ·