{"slug": "i-tried-to-build-an-ai-code-reviewer-without-sharing-my-code-here-s-what-worked", "title": "I Tried to Build an AI Code Reviewer Without Sharing My Code — Here's What Worked", "summary": "A developer built an AI code reviewer that analyzes pull requests without sending proprietary code to third-party services. After failing with local LLMs and finding existing solutions too expensive or complex, the engineer created a thin Python module that works with any API endpoint supporting the chat completions format. The solution uses structured JSON output and retry logic to produce reliable code reviews from self-hosted or local models.", "body_md": "A few weeks ago, I was staring down a pull request with 800+ lines of changes. The team was moving fast, and I wanted a quick sanity check on style, potential bugs, and security concerns. My first instinct? Ask ChatGPT. But then the paranoia set in: I'd be pasting proprietary code into a black box. Our legal team would have a heart attack. So I went looking for a way to run AI-powered code review without sending sensitive data to a third party.\n\nI'm not going to pretend this was a smooth ride. I banged my head against local LLMs, tried to wrangle Python scripts, and eventually landed on a pattern that actually works. This article is the one I wish I'd found back then.\n\nMy team uses GitHub Copilot for inline suggestions, but it sends code snippets to Microsoft's cloud. For our internal tools and customer-facing code, that's a hard no. I needed a review bot that could:\n\nI didn't want to maintain a Kubernetes cluster of GPUs, but I also didn't want to sign another enterprise agreement.\n\nI downloaded Llama 3 8B via Ollama. It ran on my laptop. I wrote a Python script to feed it a diff and ask for review. Results? Terrible. The model would hallucinate line numbers, ignore the prompt, and sometimes just ramble about \"great code!\" without any actionable feedback. Plus, I had to manually handle the API and parse text responses. Not reproducible.\n\nLangChain seemed like the obvious answer. I set up a chain with a prompt template and a local model. The setup was clunky, dependencies were heavy, and the structured output parser kept breaking. When it finally worked, the latency was 30+ seconds per review. Not usable for real-time PR checks.\n\nI evaluated a few providers that promised data privacy. Most required a yearly contract, a dedicated endpoint, and a minimum spend. For a solo developer or small team, overkill.\n\nI stepped back and realized the core problem wasn't the model — it was the integration pattern. I needed:\n\nI wrote a thin Python module that does exactly that. It works with any API endpoint that supports the chat completions format. Here's the core of it.\n\n``` python\nimport json\nimport time\nfrom typing import Optional\nimport httpx\n\nclass AICodeReviewer:\n    def __init__(self, api_url: str, api_key: str, model: str = \"gpt-4o\"):\n        self.client = httpx.Client(base_url=api_url, timeout=120.0)\n        self.api_key = api_key\n        self.model = model\n\n    def review_diff(self, diff: str, max_retries: int = 3) -> dict:\n        prompt = f\"\"\"\nYou are a senior code reviewer. Analyze the following diff and return a JSON object with:\n- 'summary': a short summary of the changes\n- 'issues': an array of objects each with 'line', 'severity' (critical/warning/info), 'message', and optionally 'suggestion'\n- 'score': an integer from 1 to 10\n\nDiff:\n{diff}\n\"\"\"\n        for attempt in range(max_retries):\n            try:\n                response = self.client.post(\n                    \"/v1/chat/completions\",\n                    headers={\"Authorization\": f\"Bearer {self.api_key}\"},\n                    json={\n                        \"model\": self.model,\n                        \"messages\": [{\"role\": \"user\", \"content\": prompt}],\n                        \"response_format\": {\"type\": \"json_object\"},\n                        \"max_tokens\": 2048\n                    }\n                )\n                response.raise_for_status()\n                data = response.json()\n                content = data[\"choices\"][0][\"message\"][\"content\"]\n                return json.loads(content)\n            except (httpx.HTTPStatusError, json.JSONDecodeError, KeyError) as e:\n                print(f\"Attempt {attempt+1} failed: {e}\")\n                time.sleep(2 ** attempt)\n        raise RuntimeError(\"All retries exhausted\")\n```\n\nThe `response_format`\n\nfield is key — even smaller local models can produce valid JSON if you prompt them right and request structured output. (Most OpenAI-compatible local backends now support this.)\n\nI tested this with a self-hosted endpoint I set up for internal use (pointed at an API like `https://ai.interwestinfo.com/`\n\nbut you can use any compatible provider). The code doesn't care whether the model lives on a GPU cluster or a Raspberry Pi.\n\nI wrapped the above into a CLI that takes a file or piped diff:\n\n``` bash\n$ cat example.diff | python review.py --endpoint https://my-ai-server.com --api-key $KEY --model code-review-7b\n```\n\nOutput:\n\n```\n{\n  \"summary\": \"Adds authentication middleware\",\n  \"issues\": [\n    {\n      \"line\": 34,\n      \"severity\": \"critical\",\n      \"message\": \"Hardcoded JWT secret\",\n      \"suggestion\": \"Use environment variable\"\n    },\n    {\n      \"line\": 102,\n      \"severity\": \"info\",\n      \"message\": \"Missing error handling for token expiry\"\n    }\n  ],\n  \"score\": 6\n}\n```\n\nNow I can pipe any diff into this tool, get structured feedback, and integrate it into a GitHub Action or pre-commit hook.\n\n`response_format`\n\nor function calling.`response_format`\n\nalready existed.I'm still iterating on this — next up is adding batch review of multiple files and caching results to speed up repeated checks. The whole repository is a single Python file and a Dockerfile. It's not a product, just a tool that solved my itch.\n\nHow do you handle AI-powered code review in a privacy-sensitive environment? Are you running local models or trusting a provider? I'd love to hear what's working (or not) for you.", "url": "https://wpnews.pro/news/i-tried-to-build-an-ai-code-reviewer-without-sharing-my-code-here-s-what-worked", "canonical_source": "https://dev.to/__c1b9e06dc90a7e0a676b/i-tried-to-build-an-ai-code-reviewer-without-sharing-my-code-heres-what-worked-212f", "published_at": "2026-06-04 02:00:49+00:00", "updated_at": "2026-06-04 02:12:28.545153+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "ai-safety", "ai-infrastructure", "generative-ai"], "entities": ["ChatGPT", "GitHub Copilot", "Microsoft", "Llama 3", "Ollama", "LangChain"], "alternates": {"html": "https://wpnews.pro/news/i-tried-to-build-an-ai-code-reviewer-without-sharing-my-code-here-s-what-worked", "markdown": "https://wpnews.pro/news/i-tried-to-build-an-ai-code-reviewer-without-sharing-my-code-here-s-what-worked.md", "text": "https://wpnews.pro/news/i-tried-to-build-an-ai-code-reviewer-without-sharing-my-code-here-s-what-worked.txt", "jsonld": "https://wpnews.pro/news/i-tried-to-build-an-ai-code-reviewer-without-sharing-my-code-here-s-what-worked.jsonld"}}