A few weeks ago, I was staring down a pull request with 800+ lines of changes. The team was moving fast, and I wanted a quick sanity check on style, potential bugs, and security concerns. My first instinct? Ask ChatGPT. But then the paranoia set in: I'd be pasting proprietary code into a black box. Our legal team would have a heart attack. So I went looking for a way to run AI-powered code review without sending sensitive data to a third party.
I'm not going to pretend this was a smooth ride. I banged my head against local LLMs, tried to wrangle Python scripts, and eventually landed on a pattern that actually works. This article is the one I wish I'd found back then.
My team uses GitHub Copilot for inline suggestions, but it sends code snippets to Microsoft's cloud. For our internal tools and customer-facing code, that's a hard no. I needed a review bot that could:
I didn't want to maintain a Kubernetes cluster of GPUs, but I also didn't want to sign another enterprise agreement.
I downloaded Llama 3 8B via Ollama. It ran on my laptop. I wrote a Python script to feed it a diff and ask for review. Results? Terrible. The model would hallucinate line numbers, ignore the prompt, and sometimes just ramble about "great code!" without any actionable feedback. Plus, I had to manually handle the API and parse text responses. Not reproducible.
LangChain seemed like the obvious answer. I set up a chain with a prompt template and a local model. The setup was clunky, dependencies were heavy, and the structured output parser kept breaking. When it finally worked, the latency was 30+ seconds per review. Not usable for real-time PR checks.
I evaluated a few providers that promised data privacy. Most required a yearly contract, a dedicated endpoint, and a minimum spend. For a solo developer or small team, overkill.
I stepped back and realized the core problem wasn't the model — it was the integration pattern. I needed:
I wrote a thin Python module that does exactly that. It works with any API endpoint that supports the chat completions format. Here's the core of it.
import json
import time
from typing import Optional
import httpx
class AICodeReviewer:
def __init__(self, api_url: str, api_key: str, model: str = "gpt-4o"):
self.client = httpx.Client(base_url=api_url, timeout=120.0)
self.api_key = api_key
self.model = model
def review_diff(self, diff: str, max_retries: int = 3) -> dict:
prompt = f"""
You are a senior code reviewer. Analyze the following diff and return a JSON object with:
- 'summary': a short summary of the changes
- 'issues': an array of objects each with 'line', 'severity' (critical/warning/info), 'message', and optionally 'suggestion'
- 'score': an integer from 1 to 10
Diff:
{diff}
"""
for attempt in range(max_retries):
try:
response = self.client.post(
"/v1/chat/completions",
headers={"Authorization": f"Bearer {self.api_key}"},
json={
"model": self.model,
"messages": [{"role": "user", "content": prompt}],
"response_format": {"type": "json_object"},
"max_tokens": 2048
}
)
response.raise_for_status()
data = response.json()
content = data["choices"][0]["message"]["content"]
return json.loads(content)
except (httpx.HTTPStatusError, json.JSONDecodeError, KeyError) as e:
print(f"Attempt {attempt+1} failed: {e}")
time.sleep(2 ** attempt)
raise RuntimeError("All retries exhausted")
The response_format
field is key — even smaller local models can produce valid JSON if you prompt them right and request structured output. (Most OpenAI-compatible local backends now support this.)
I tested this with a self-hosted endpoint I set up for internal use (pointed at an API like https://ai.interwestinfo.com/
but you can use any compatible provider). The code doesn't care whether the model lives on a GPU cluster or a Raspberry Pi.
I wrapped the above into a CLI that takes a file or piped diff:
$ cat example.diff | python review.py --endpoint https://my-ai-server.com --api-key $KEY --model code-review-7b
Output:
{
"summary": "Adds authentication middleware",
"issues": [
{
"line": 34,
"severity": "critical",
"message": "Hardcoded JWT secret",
"suggestion": "Use environment variable"
},
{
"line": 102,
"severity": "info",
"message": "Missing error handling for token expiry"
}
],
"score": 6
}
Now I can pipe any diff into this tool, get structured feedback, and integrate it into a GitHub Action or pre-commit hook.
response_format
or function calling.response_format
already existed.I'm still iterating on this — next up is adding batch review of multiple files and caching results to speed up repeated checks. The whole repository is a single Python file and a Dockerfile. It's not a product, just a tool that solved my itch.
How do you handle AI-powered code review in a privacy-sensitive environment? Are you running local models or trusting a provider? I'd love to hear what's working (or not) for you.