{"slug": "executing-the-diff-how-greptiles-trex-runs-your-code", "title": "Executing the Diff: How Greptile’s TREX Runs Your Code", "summary": "Greptile launched TREX, an execution layer for code review that runs code and generates multi-modal artifacts, moving beyond static diff analysis. The tool uses an orchestrator-subagent architecture to spin up environments, execute code, and provide verifiable proof of behavior, catching runtime issues like race conditions and UI regressions that static tools miss.", "body_md": "[Dev Tools](https://www.devclubhouse.com/c/dev-tools)Article\n\n# Executing the Diff: How Greptile’s TREX Runs Your Code\n\nBy executing code and generating multi-modal artifacts during review, TREX moves AI analysis past the limits of static diffs.\n\n[Priya Nair](https://www.devclubhouse.com/u/priya_nair)\n\nCode review has not fundamentally changed since 1976, when Michael Fagan introduced formal code inspection at IBM. Back then, developers printed out source listings and read them line-by-line in a conference room. Today, developers look at a pull request diff on a screen. While modern AI tools have accelerated this process, most of them still operate under the same fundamental constraint: they are only reading the code.\n\nThis static approach works well for syntax errors or obvious bugs, but it fails to catch runtime issues. Logic errors dependent on specific state sequences, UI regressions that occur post-load, and race conditions require execution to be discovered. Static code review can reason about what code says, but it cannot verify what it actually does.\n\nTo bridge this gap, AI developer tool startup [Greptile](https://www.greptile.com) built TREX (Test, Run, Execute), an execution layer integrated directly into the code review workflow. Instead of guessing how a change will behave, TREX spins up environments, runs the code, and provides verifiable proof of its behavior.\n\n## The Architectural Evolution: From Standalone to Orchestrated\n\nBuilding an AI agent that executes code in a sandbox is not straightforward. The team at Greptile initially built TREX as a standalone agent designed to generate and run tests independently. However, this decoupled approach failed to surface meaningful bugs. Generating tests in a vacuum did not align with what developers were trying to achieve, resulting in noisy, irrelevant test suites that missed critical edge cases.\n\nFurthermore, running two independent agents—the main Greptile reviewer and the standalone TREX agent—led to massive context duplication. Both agents operated without shared knowledge, often exploring the same parts of the codebase in parallel and wasting expensive compute.\n\nTo solve this, the team initially tried combining them into a single monolithic agent. This introduced a different bottleneck: context overload. A single agent tasked with reading the diff, spinning up services, capturing screenshots, and running tests quickly became overwhelmed by the sheer volume of state and context.\n\n[Serverless Inference by DigitalOcean 55+ models, every modality. One API key, one bill.](https://www.devclubhouse.com/go/ad/13)\n\nUltimately, Greptile settled on an orchestrator-subagent architecture. The main Greptile reviewer agent acts as the orchestrator. It reads the pull request diff, identifies potential problem areas that warrant investigation, and spins up dedicated, parallel TREX subagents. Each subagent inherits the orchestrator's context but operates within its own scoped context window, focused entirely on investigating a single, specific issue.\n\n## Navigating Runtime Complexity\n\nExecuting code during a code review requires navigating the same environmental hurdles that human developers face. For example, testing a new UI feature might require bypassing an authentication gate, configuring specific environment variables, and ensuring a feature flag is in the correct state.\n\nUnder the orchestrated model, a dedicated TREX subagent is responsible for resolving these dependencies. It spins up the necessary services, handles authentication, configures the environment, and executes the target path. If the subagent is testing a frontend change, it can render the page and capture the output without requiring human intervention to set up the local state.\n\n## Show Your Work: Multi-Modal Artifacts over Bullet Points\n\nEarly iterations of TREX reported their findings in simple text summaries, such as bulleted lists detailing what was tested and what failed. This proved highly ineffective. A text summary like \"Tested checkout flow, found failure\" lacks the diagnostic detail needed to debug. It fails to show whether the failure occurred during environment setup, assertion execution, or due to an infrastructure timeout.\n\nFurthermore, text-only reporting left the system vulnerable to agent hallucinations, where the AI claimed to have tested paths it never actually executed.\n\nTo establish trust and reproducibility, the developers shifted to a multi-modal artifact model. For every run, TREX generates a comprehensive set of artifacts, including:\n\n**Execution Scripts:** The exact scripts used to run the tests.**Console Logs & API Traces:** Full runtime output and network activity.**Screenshots & Video:** Visual evidence of the rendered UI. For instance, if a developer submits an animation change, TREX captures a video of the animation playing in the runtime environment.\n\nThese artifacts serve as verifiable proof for both human reviewers and downstream agents. Much like showing your work in mathematics, having a step-by-step trace allows developers to pinpoint exactly where an execution failed. If TREX uncovers a bug, it posts a detailed comment directly on the pull request. If the execution succeeds, the artifacts are included in the PR summary, providing concrete proof that the code was successfully run and verified.\n\n## Sources & further reading\n\n-\n[TREX: An AI code reviewer that runs your code](https://www.greptile.com/blog/trex-code-execution)— greptile.com\n\n[Priya Nair](https://www.devclubhouse.com/u/priya_nair)· AI & Developer Experience Writer\n\nPriya covers AI frameworks, developer productivity tooling, and the startup ecosystem across South and Southeast Asia, bringing a researcher's rigour and a practitioner's empathy to every story. She is deeply sceptical of benchmarks and asks hard questions so her readers don't have to.\n\n## Discussion 0\n\nNo comments yet\n\nBe the first to weigh in.", "url": "https://wpnews.pro/news/executing-the-diff-how-greptiles-trex-runs-your-code", "canonical_source": "https://www.devclubhouse.com/a/executing-the-diff-how-greptiles-trex-runs-your-code", "published_at": "2026-06-17 20:03:26+00:00", "updated_at": "2026-06-19 03:04:07.838393+00:00", "lang": "en", "topics": ["developer-tools", "ai-tools", "ai-agents", "ai-infrastructure"], "entities": ["Greptile", "TREX", "IBM", "Michael Fagan", "Priya Nair", "DigitalOcean"], "alternates": {"html": "https://wpnews.pro/news/executing-the-diff-how-greptiles-trex-runs-your-code", "markdown": "https://wpnews.pro/news/executing-the-diff-how-greptiles-trex-runs-your-code.md", "text": "https://wpnews.pro/news/executing-the-diff-how-greptiles-trex-runs-your-code.txt", "jsonld": "https://wpnews.pro/news/executing-the-diff-how-greptiles-trex-runs-your-code.jsonld"}}