Opus vs GLM-5.2 in a coding-agent pipeline — paired-run findings A controlled A/B test comparing Claude Opus and GLM-5.2 in a coding-agent pipeline revealed qualitative differences in engineering behavior. Using the same paper-implementation pipeline across 10 repository forks, Anthropic's model produced scoping discussions while GLM-5.2 generated complete pull requests with code, tests, and documentation. The findings highlight distinct temperaments: one model defaults to analysis, the other to action. A controlled A/B across 10 repository forks × 2 model providers, running an identical paper-implementation pipeline remyxai/outrider https://github.com/remyxai/outrider — Claude Code under the hood, with glm-5.2 routed at z.ai's Coding Plan endpoint vs default Anthropic . Same paper pinned to each repo, same chain, same prompt-set — model is the only variable. The interesting findings aren't quantitative; they're qualitative differences in how each agent behaves when asked to do real engineering work . The action under test is remyxai/outrider https://github.com/remyxai/outrider ; the installs the workflow on a target fork and dispatches pinned-paper runs: https://github.com/remyxai/remyxai-cli remyxai-cli Install Outrider on the target fork one-time setup . remyxai outrider init --repo your-fork/repo --interest-id