{"slug": "building-a-code-reviewer-from-your-team-s-pr-history", "title": "Building a code reviewer from your team's PR history", "summary": "A staff engineer at an unnamed company built an AI code reviewer that mines three years of team PR comments to generate feedback in each engineer's voice, addressing the bottleneck of code review in the age of AI-generated code. The tool, open-sourced on GitHub, evolved from a generic checklist to a system that captures individual reviewer preferences, such as vetoing certain dependencies or calculating database query impacts.", "body_md": "That request you asked the platform team for? Yeah... Two months now, and it's still in their backlog. You're starting to take it personally.\n\nPlease don't. I managed a platform team back in 2017, and I can tell you firsthand: they are not slow or unresponsive, they are buried. Infra upgrades alone used to eat my team's whole week.\n\nAnd now, with the AI craze, it's even worse. Every new AI tool and integration falls onto the same people, and they were already the bottleneck.[ Talos Linux](https://www.siderolabs.com?utm_source=newsletter.manager.dev&utm_medium=newsletter&utm_campaign=your-own-code-review) is the open-source OS that takes that weight off them. Built only for Kubernetes, it kills the upgrade pain and the 2am reboots (read\n\n[here](https://www.siderolabs.com/blog/infrastructure-brief-for-building-predictable-kubernetes-infrastructure-at-scale?utm_source=newsletter.manager.dev&utm_medium=newsletter&utm_campaign=your-own-code-review)how the magic happens). Less firefighting for them means your request finally moves.\n\nCode review became a pain in the ass.\n\nYou have engineers who just delegate decisions to Claude. You have non-engineers vibe-coding and expecting you to review and fix their mess. And even without them, every capable engineer produces more code now.\n\nMore code means more reviews, more context switches, less patience, and more shitty code slipping through. This results in more bugs, conflicting standards, confused LLMs, slower dev speed and finally **even more pressure from above:**\n\nLast month, [Yaniv](https://www.linkedin.com/in/ayaniv/) (a staff engineer in my company) took a genius but simple approach to improve the situation in his team. Here’s the [open-source repo](https://github.com/ayaniv/t2a-review-template) - clone it and try yourself.\n\nHe wrote a detailed [article on medium](https://honeybook.engineering/we-accidentally-built-an-ai-code-reviewer-that-thinks-like-us-411f7c083629), and today I brought him to share the TLDR version:\n\nMic to [Yaniv ](https://www.linkedin.com/in/ayaniv/)🎤\n\nLast month, our team had a session about our code review process. It was slow, caused lots of context switches, and became our single biggest pain. We brainstormed some options, until our tech lead said: “What can we build to reduce this?”\n\nSo I took it on. I started with a generic pre-PR checklist and ended up with an AI reviewer that mines three years of our PR comments and writes feedback in each engineer's voice.\n\nThe version we have now reads PRs the way the team does. It knows that Iggy will veto any MobX dependency, and that Amit will calculate exactly how many DB queries your loop fires. It takes everyone's taste into account on every PR.\n\nHere’s how we got to it, and the steps you can take to do it too:\n\n## V0: A generic checklist\n\nThe first version took a couple of hours. I built a Claude Code skill that anyone could run before sending a PR for review. It checked the diff against a markdown checklist specific to our team, mixed with company conventions and generic React best practices.\n\nI got some positive comments, but two problems came up: maintaining a checklist by hand would go stale fast, and our EM said \"It's too generic, it's not really dedicated to our team.\" He was right. I had built what basically amounts to ESLint with a chat interface.\n\n## V1: Let’s mine the history\n\nI sat with it for a couple of hours - between meetings, at lunch, always in the back of my head.\n\n\"Dedicated to our team\" sounded like a lot of work. Hand-writing a custom checklist for each engineer meant interviewing everyone, codifying their preferences, and expecting them to keep those documents current - the exact maintenance problem we'd already identified.\n\nThen I had a real 'aha' moment. Every reviewer is different. Some obsess over tests, some spot architecture problems, some have strong opinions on readability or accessibility. And every one of them had left** thousands of real PR comments** over the years, capturing exactly what they push back on.\n\nAfterward, the implementation was pretty simple:\n\nPull every PR comment via the GitHub API\n\nHave Claude find each person's recurring patterns\n\nGenerate an .md profile per engineer with real PR citations as evidence\n\nWhen you ran the skill, it read every profile, and applied them all to your diff\n\nThe result was a review that included all patterns from all team members - in one pass, before the human reviewers even opened the PR\n\nThe team loved it. The feedback was specific, attributed - each finding cited which profile flagged it and which historical PR the pattern came from - and recognizably us. Engineers reading the output were saying \"yep, that's exactly what Amit would have said.\"\n\nBy the end of that first week, every engineer was running the skill before pushing PRs.\n\n## V2: We got greedy\n\nAn engineer on the team suggested that instead of one Claude call reading all the profiles together, we spin up a separate agent for each reviewer, all running at the same time. Then add a final \"skeptic\" agent that reads everyone's findings and throws out the false positives and duplicates.\n\nIt worked, but oh boy, the cost. On a 469-line PR, the thirteen agents ran in parallel on Opus and burned through 1 million tokens, costing roughly $20. The quality was great, but $20 per run was too expensive for us.\n\n## V2.1: Tiered intelligence\n\nTo address the issue, we broke down the flow into multiple steps:\n\nV2.1 starts with a fully deterministic step (yeah, there’s still a lot of room for those!) - plain code looks at which files changed and labels the PR as frontend, backend, or mixed. Then it drops the parts of each reviewer's profile that don't apply - on a frontend-only PR, the backend rules get stripped out before any agent sees them. That step is basically free and shrinks how much the models have to read.\n\nAfter that, three reviewer agents run in parallel, in a combination of Haiku (for simple reading-heavy passes) and Sonnet (for the part that needs real reasoning).\n\nLast, Opus runs as the skeptic at the end.\n\nThe redesign that made the system ~7x cheaper.\n\nOn the same 469-line PR that cost $20.79 before, v2.1 cost $2.99 - about 7× cheaper. And it actually found more: it caught 10/12 of the V2 issues, but added 7 new ones.\n\n## 3 takeaways for your own implementation:\n\nYour team’s\n\n**PR history is a gold mine**. The result is very different from anything a generic prompt produces.** Tier your intelligence by job**. Haiku for breadth and reading-heavy work, Sonnet for synthesis, Opus only as the skeptic at the end.An Opus skeptic pass solves hallucination better than tighter prompting. It’s easier to throw away five wrong findings than to make ten findings perfect.\n\n**Let the cheaper agents over-produce and the smartest agent prune.**\n\nYou can check here the [ public template repo](https://github.com/ayaniv/t2a-review-template). It includes the directory structure, the pre-phase classifier, the orchestrator, and the prompt for generating a new reviewer profile from a GitHub handle.\n\nBuilding your team’s version should take an afternoon 🙂\n\nThanks Yaniv for a great and super useful idea!\n\nThis approach of course won’t solve 100% of your code review problem. LLMs are good at basic patterns, but the most useful code reviews challenge the decisions taken, not just implementation details.\n\nStill, even catching basic patterns can help reduce the cognitive load on reviewers, leaving more of it to focus on bigger issues.\n\n## Discover weekly\n\n[Why is Meta destroying its engineering organization?](https://newsletter.pragmaticengineer.com/p/why-is-meta-destroying-its-engineering)Another superb journalistic article by Gergely on what happens inside Meta.[Revised rules of engineering leadership.](https://archive.lethain.com/archive/revised-rules-of-engineering-leadership-198a/)Will Larson with 5 changes from the last few years. A must read if you are a senior engineering leader thinking how to rebuild your org.[Big crocs vs little crocs](https://marcrandolph.substack.com/p/big-crocs-vs-little-crocs). On speed and crocodiles.", "url": "https://wpnews.pro/news/building-a-code-reviewer-from-your-team-s-pr-history", "canonical_source": "https://newsletter.manager.dev/p/building-a-code-reviewer-from-your-team-s-pr-history", "published_at": "2026-06-23 06:01:00+00:00", "updated_at": "2026-06-24 00:48:51.801695+00:00", "lang": "en", "topics": ["ai-tools", "developer-tools", "large-language-models", "ai-agents"], "entities": ["Claude Code", "Yaniv", "HoneyBook", "Talos Linux", "Sidero Labs", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/building-a-code-reviewer-from-your-team-s-pr-history", "markdown": "https://wpnews.pro/news/building-a-code-reviewer-from-your-team-s-pr-history.md", "text": "https://wpnews.pro/news/building-a-code-reviewer-from-your-team-s-pr-history.txt", "jsonld": "https://wpnews.pro/news/building-a-code-reviewer-from-your-team-s-pr-history.jsonld"}}