# I Tried to Build an AI Code Reviewer Without Sharing My Code — Here's What Worked

> Source: <https://dev.to/__c1b9e06dc90a7e0a676b/i-tried-to-build-an-ai-code-reviewer-without-sharing-my-code-heres-what-worked-212f>
> Published: 2026-06-04 02:00:49+00:00

A few weeks ago, I was staring down a pull request with 800+ lines of changes. The team was moving fast, and I wanted a quick sanity check on style, potential bugs, and security concerns. My first instinct? Ask ChatGPT. But then the paranoia set in: I'd be pasting proprietary code into a black box. Our legal team would have a heart attack. So I went looking for a way to run AI-powered code review without sending sensitive data to a third party.

I'm not going to pretend this was a smooth ride. I banged my head against local LLMs, tried to wrangle Python scripts, and eventually landed on a pattern that actually works. This article is the one I wish I'd found back then.

My team uses GitHub Copilot for inline suggestions, but it sends code snippets to Microsoft's cloud. For our internal tools and customer-facing code, that's a hard no. I needed a review bot that could:

I didn't want to maintain a Kubernetes cluster of GPUs, but I also didn't want to sign another enterprise agreement.

I downloaded Llama 3 8B via Ollama. It ran on my laptop. I wrote a Python script to feed it a diff and ask for review. Results? Terrible. The model would hallucinate line numbers, ignore the prompt, and sometimes just ramble about "great code!" without any actionable feedback. Plus, I had to manually handle the API and parse text responses. Not reproducible.

LangChain seemed like the obvious answer. I set up a chain with a prompt template and a local model. The setup was clunky, dependencies were heavy, and the structured output parser kept breaking. When it finally worked, the latency was 30+ seconds per review. Not usable for real-time PR checks.

I evaluated a few providers that promised data privacy. Most required a yearly contract, a dedicated endpoint, and a minimum spend. For a solo developer or small team, overkill.

I stepped back and realized the core problem wasn't the model — it was the integration pattern. I needed:

I wrote a thin Python module that does exactly that. It works with any API endpoint that supports the chat completions format. Here's the core of it.

``` python
import json
import time
from typing import Optional
import httpx

class AICodeReviewer:
    def __init__(self, api_url: str, api_key: str, model: str = "gpt-4o"):
        self.client = httpx.Client(base_url=api_url, timeout=120.0)
        self.api_key = api_key
        self.model = model

    def review_diff(self, diff: str, max_retries: int = 3) -> dict:
        prompt = f"""
You are a senior code reviewer. Analyze the following diff and return a JSON object with:
- 'summary': a short summary of the changes
- 'issues': an array of objects each with 'line', 'severity' (critical/warning/info), 'message', and optionally 'suggestion'
- 'score': an integer from 1 to 10

Diff:
{diff}
"""
        for attempt in range(max_retries):
            try:
                response = self.client.post(
                    "/v1/chat/completions",
                    headers={"Authorization": f"Bearer {self.api_key}"},
                    json={
                        "model": self.model,
                        "messages": [{"role": "user", "content": prompt}],
                        "response_format": {"type": "json_object"},
                        "max_tokens": 2048
                    }
                )
                response.raise_for_status()
                data = response.json()
                content = data["choices"][0]["message"]["content"]
                return json.loads(content)
            except (httpx.HTTPStatusError, json.JSONDecodeError, KeyError) as e:
                print(f"Attempt {attempt+1} failed: {e}")
                time.sleep(2 ** attempt)
        raise RuntimeError("All retries exhausted")
```

The `response_format`

field is key — even smaller local models can produce valid JSON if you prompt them right and request structured output. (Most OpenAI-compatible local backends now support this.)

I tested this with a self-hosted endpoint I set up for internal use (pointed at an API like `https://ai.interwestinfo.com/`

but you can use any compatible provider). The code doesn't care whether the model lives on a GPU cluster or a Raspberry Pi.

I wrapped the above into a CLI that takes a file or piped diff:

``` bash
$ cat example.diff | python review.py --endpoint https://my-ai-server.com --api-key $KEY --model code-review-7b
```

Output:

```
{
  "summary": "Adds authentication middleware",
  "issues": [
    {
      "line": 34,
      "severity": "critical",
      "message": "Hardcoded JWT secret",
      "suggestion": "Use environment variable"
    },
    {
      "line": 102,
      "severity": "info",
      "message": "Missing error handling for token expiry"
    }
  ],
  "score": 6
}
```

Now I can pipe any diff into this tool, get structured feedback, and integrate it into a GitHub Action or pre-commit hook.

`response_format`

or function calling.`response_format`

already existed.I'm still iterating on this — next up is adding batch review of multiple files and caching results to speed up repeated checks. The whole repository is a single Python file and a Dockerfile. It's not a product, just a tool that solved my itch.

How do you handle AI-powered code review in a privacy-sensitive environment? Are you running local models or trusting a provider? I'd love to hear what's working (or not) for you.
