I Tried to Build an AI Code Reviewer Without Sharing My Code — Here's What Worked A developer built an AI code reviewer that analyzes pull requests without sending proprietary code to third-party services. After failing with local LLMs and finding existing solutions too expensive or complex, the engineer created a thin Python module that works with any API endpoint supporting the chat completions format. The solution uses structured JSON output and retry logic to produce reliable code reviews from self-hosted or local models. A few weeks ago, I was staring down a pull request with 800+ lines of changes. The team was moving fast, and I wanted a quick sanity check on style, potential bugs, and security concerns. My first instinct? Ask ChatGPT. But then the paranoia set in: I'd be pasting proprietary code into a black box. Our legal team would have a heart attack. So I went looking for a way to run AI-powered code review without sending sensitive data to a third party. I'm not going to pretend this was a smooth ride. I banged my head against local LLMs, tried to wrangle Python scripts, and eventually landed on a pattern that actually works. This article is the one I wish I'd found back then. My team uses GitHub Copilot for inline suggestions, but it sends code snippets to Microsoft's cloud. For our internal tools and customer-facing code, that's a hard no. I needed a review bot that could: I didn't want to maintain a Kubernetes cluster of GPUs, but I also didn't want to sign another enterprise agreement. I downloaded Llama 3 8B via Ollama. It ran on my laptop. I wrote a Python script to feed it a diff and ask for review. Results? Terrible. The model would hallucinate line numbers, ignore the prompt, and sometimes just ramble about "great code " without any actionable feedback. Plus, I had to manually handle the API and parse text responses. Not reproducible. LangChain seemed like the obvious answer. I set up a chain with a prompt template and a local model. The setup was clunky, dependencies were heavy, and the structured output parser kept breaking. When it finally worked, the latency was 30+ seconds per review. Not usable for real-time PR checks. I evaluated a few providers that promised data privacy. Most required a yearly contract, a dedicated endpoint, and a minimum spend. For a solo developer or small team, overkill. I stepped back and realized the core problem wasn't the model — it was the integration pattern. I needed: I wrote a thin Python module that does exactly that. It works with any API endpoint that supports the chat completions format. Here's the core of it. python import json import time from typing import Optional import httpx class AICodeReviewer: def init self, api url: str, api key: str, model: str = "gpt-4o" : self.client = httpx.Client base url=api url, timeout=120.0 self.api key = api key self.model = model def review diff self, diff: str, max retries: int = 3 - dict: prompt = f""" You are a senior code reviewer. Analyze the following diff and return a JSON object with: - 'summary': a short summary of the changes - 'issues': an array of objects each with 'line', 'severity' critical/warning/info , 'message', and optionally 'suggestion' - 'score': an integer from 1 to 10 Diff: {diff} """ for attempt in range max retries : try: response = self.client.post "/v1/chat/completions", headers={"Authorization": f"Bearer {self.api key}"}, json={ "model": self.model, "messages": {"role": "user", "content": prompt} , "response format": {"type": "json object"}, "max tokens": 2048 } response.raise for status data = response.json content = data "choices" 0 "message" "content" return json.loads content except httpx.HTTPStatusError, json.JSONDecodeError, KeyError as e: print f"Attempt {attempt+1} failed: {e}" time.sleep 2 attempt raise RuntimeError "All retries exhausted" The response format field is key — even smaller local models can produce valid JSON if you prompt them right and request structured output. Most OpenAI-compatible local backends now support this. I tested this with a self-hosted endpoint I set up for internal use pointed at an API like https://ai.interwestinfo.com/ but you can use any compatible provider . The code doesn't care whether the model lives on a GPU cluster or a Raspberry Pi. I wrapped the above into a CLI that takes a file or piped diff: bash $ cat example.diff | python review.py --endpoint https://my-ai-server.com --api-key $KEY --model code-review-7b Output: { "summary": "Adds authentication middleware", "issues": { "line": 34, "severity": "critical", "message": "Hardcoded JWT secret", "suggestion": "Use environment variable" }, { "line": 102, "severity": "info", "message": "Missing error handling for token expiry" } , "score": 6 } Now I can pipe any diff into this tool, get structured feedback, and integrate it into a GitHub Action or pre-commit hook. response format or function calling. response format already existed.I'm still iterating on this — next up is adding batch review of multiple files and caching results to speed up repeated checks. The whole repository is a single Python file and a Dockerfile. It's not a product, just a tool that solved my itch. How do you handle AI-powered code review in a privacy-sensitive environment? Are you running local models or trusting a provider? I'd love to hear what's working or not for you.