Greptile's TREX pushes AI code review past reading diffs

wpnews.pro

Greptile, founded by Daksh Gupta, Soohoon Choi, and Vaishant Kameswaran, launched TREX, an execution layer for AI code review that runs pull request code and attaches artifacts to the review instead of only commenting on the diff.

The launch, detailed in a June 15 Greptile blog post, is Greptile's clearest move yet from AI reviewer to automated validation system. Two days later, Greptile engineer Shlok Mehrotra published a technical engineering post explaining how TREX was built: the main Greptile reviewer reads the diff, identifies behavior worth testing, and spins up parallel TREX subagents that run code in sandboxed environments. Those subagents return comments backed by logs, screenshots, API traces, execution scripts, and, for UI changes, video.

That distinction matters because most AI code review tools are still structurally close to the human ritual they are replacing: read the patch, infer risk, and leave a comment. Greptile is arguing that the next bottleneck is not review speed. It is proof. As AI coding agents increase the amount of code entering repositories, the valuable reviewer is the one that can show what happened when the code ran.

The founder bet is validation, not review

Greptile came out of the Georgia Tech ecosystem and participated in CREATE-X; it later joined Y Combinator's Winter 2024 batch. By January 2026, the Georgia Tech News Center reported the company had more than 2,000 customers.

The origin story explains why TREX is more than a feature toggle. Greptile began as a product for understanding codebases; the market has since moved toward agents that write, review, and ship code. That shift changes what a review product has to prove. A model can read a diff and infer that a page might break. A validation layer has to open the page, exercise the path, and show the evidence.

Greptile's launch materials state the thesis plainly: static review can reason about what code says, not what it does. Greptile says TREX, short for "Test, Run, Execute," is in public beta and is free to all Greptile users until the end of June 2026. After that, Greptile says it will cost $2 per run on top of the normal review cost. The company's pricing page lists the base Pro plan at $30 per seat per month, with 50 code reviews included per seat and $1 for each additional review.

Greptile also says reviews with TREX catch 20% more bugs in its evals. That is a company-reported number, and Greptile does not publish the full benchmark set in the launch post. The more useful read is the shape of the metric: Greptile is trying to compete on recall and evidence, not just latency or comment volume.

Why the first TREX did not work

Mehrotra's post is unusually candid about the false start. TREX began as a separate standalone agent that generated and ran tests, but Greptile found that generating tests was not the same as finding bugs. The tests were often irrelevant to the user's change, added noise, and still missed edge cases.

The team then tried collapsing review and execution into a single agent. That created the opposite problem: one agent had to read the diff, manage services, take screenshots, run tests, and hold all of that context at once. Greptile's current architecture splits the work. The primary reviewer acts as an orchestrator, while dedicated TREX subagents are scoped to specific issues and inherit the context the main reviewer has already gathered.

That architecture is the most important product decision in the launch. It frames TREX less as a generic testing bot and more as an execution system embedded inside review. In a UI example from the post, a feature hidden behind authentication requires environment setup, auth handling, and the right feature flag state. Greptile's claim is that the subagent can do that setup, render the feature, and return a screenshot instead of a text assertion.

Artifacts are the product surface

Greptile's early TREX output was a list of bullet points describing what had been tested. The company found that format was not enough. A failed checkout test, by itself, does not tell a developer whether the setup failed, the assertion was wrong, the environment was misconfigured, or the product actually broke.

The current artifact set is Greptile's answer to that trust problem. Logs show the runtime path. Screenshots and video show rendered behavior. API traces show request and response state. Execution scripts give humans and downstream agents a way to inspect how the result was produced.

That is a direct response to a failure mode that will become more common as agents do more engineering work: a confident model summarizing a test it did not really perform. Mehrotra writes that an early version of TREX sometimes claimed it had tested more thoroughly than it had. Bullet points made that hard to audit. Artifacts give the reviewer a way to check the work.

The money behind the move

Greptile has the financing to make an infrastructure-heavy bet. The company announced a $25 million Series A led by Benchmark Capital in September 2025, with continued support from Cory Levy, Y Combinator, and Initialized Capital. Georgia Tech reported in January 2026 that the round brought Greptile's total capital raised to $30 million and valued Greptile at $180 million.

That funding history matters because TREX is not a cheap text-generation feature. The product requires disposable sandboxes, repository-specific context, artifact storage, and model orchestration. That infrastructure makes Greptile's competitive posture different from static reviewers, linters, and PR summarizers. Greptile is trying to own the validation step that sits between code generation and merge. The company still has to prove that execution-backed reviews reduce real production risk often enough to justify added compute, time, and cost. But the direction is clear: if AI writes more of the code, the scarce layer becomes a system that can run it, inspect it, and produce evidence teams can trust.

The market question

Greptile is entering a crowded developer tooling lane that includes AI code review products, general coding agents, security scanners, and incumbent platforms adding review features to existing workflows. The pressure on all of them is the same: a model comment is only useful if an engineering team believes it enough to change behavior.

TREX is Greptile's answer to that trust gap. The launch does not eliminate the hard questions. Greptile's 20% bug-catch improvement is self-reported. The cost model adds another usage meter on top of code review. Sandboxed execution is powerful, but real enterprise repositories can be messy, stateful, permissioned, and slow to reproduce.

Still, TREX is a substantive product move because it targets the part of AI software development that has been underbuilt. The coding-agent wave has made it easier to create pull requests. Greptile is betting the more valuable company is the one that decides which of those pull requests can be merged.

[Greptile](https://www.greptile.com/?ref=runtimewire)

[Introducing TREX: Greptile Now Runs Your Code](https://www.greptile.com/blog/trex?ref=runtimewire)

[Building TREX: Code execution and artifact generation for AI code review](https://www.greptile.com/blog/trex-code-execution?ref=runtimewire)

[Greptile Pricing](https://www.greptile.com/pricing?ref=runtimewire)

[Greptile Raises $25M Series A and Launches v3](https://www.greptile.com/blog/series-a?ref=runtimewire)

[Greptile on Y Combinator](https://www.ycombinator.com/companies/greptile?ref=runtimewire)

[Georgia Tech News Center on Greptile](https://news.gatech.edu/news/2026/01/05/y-combinator-backing-and-30m-investment-take-startup-greptile-next-level?ref=runtimewire)

source & further reading

runtimewire.com — original article Wafer says AMD's MI355X beats Nvidia B300 on Kimi K3 cost efficiency AI forecasts compress four different clocks into one, Rodney Brooks argues CostPerPrompt publishes calculators for estimating monthly AI application costs