{"slug": "i-run-claude-code-and-codex-side-by-side-here-s-the-division-of-labor-that-works", "title": "I run Claude Code and Codex side by side. Here's the division of labor that actually works.", "summary": "A developer describes a practical division of labor between two agentic coding tools: Claude Code for exploratory, conversational tasks and Codex for repetitive, straight-line automation. The workflow, built around Codex's non-interactive 'exec' mode, handles routine chores like commit messages and version bumps via scripts, saving time and reducing errors.", "body_md": "For a while I felt slightly embarrassed about keeping two agentic coding tools open at once. Claude Code in one terminal, Codex in another. It looked like I couldn't commit to one. Then I noticed I was reaching for each of them at different moments, on purpose, and the embarrassment turned into a workflow.\n\nThe short version: one of them is for building and exploring, the other is for running the boring, repeatable work. This post is the division of labor I landed on, built around the routine automation that made it obvious, plus the cost logic underneath it. I build WordPress plugins, so my examples lean that way, but the split is general.\n\nSome tasks are a conversation. You poke at the problem, change your mind, follow a thread, back up. Other tasks are a straight line. You know exactly what you want done, you just don't want to do it by hand for the fortieth time.\n\nI use Claude Code for the first kind. It holds the whole project in its head and is comfortable going back and forth while a design takes shape. I use Codex, specifically its non-interactive mode, for the second kind: the straight-line, do-this-exact-thing work that I want to fire from a script.\n\nOnce I framed it as conversation versus straight line, the choice of tool stopped being a vibe and became a question I could answer in a second.\n\nThe piece that made the split practical is `codex exec`\n\n. Instead of opening a chat, you hand Codex one instruction and it runs once and prints the result to stdout. That is the part you can put in a script.\n\n```\ncodex exec \"summarize the structure of this repo in one paragraph\"\n```\n\nI set the model and reasoning once, in `~/.codex/config.toml`\n\n:\n\n```\nmodel = \"gpt-5.5\"\nmodel_reasoning_effort = \"medium\"\napproval_policy = \"on-request\"\nsandbox_mode = \"workspace-write\"\n```\n\nMedium reasoning is a deliberate choice. Routine work is not hard design thinking, it's mechanical edits and summaries, and pointing heavy reasoning at it just makes the run slower and pricier without changing the output. GPT-5.5 at medium is plenty for this, and I bump it up in the moment only when a task actually turns hard. `approval_policy = \"on-request\"`\n\nmakes Codex ask before it writes files or runs commands, and `sandbox_mode = \"workspace-write\"`\n\nkeeps it from touching anything outside the working folder. Both are safety rails I leave on by default.\n\nProject conventions go in `AGENTS.md`\n\n, which is Codex's version of a `CLAUDE.md`\n\n. Codex reads it before each task, so the output stays consistent with how the project wants things done.\n\nHere is the boring stuff that used to nibble at my day.\n\nCommit messages, from the staged diff:\n\n```\ngit add -A\ncodex exec \"read git diff --staged and output a single-line commit message that summarizes the change. No preamble, message only.\"\n```\n\nVersion bumps, which is the one that earns its keep. A WordPress plugin keeps its version in two places that have to match: the `Version:`\n\nheader in the main PHP file and the `Stable tag:`\n\nin `readme.txt`\n\n. Miss one and the release breaks. By hand, I get this wrong often enough to dread it.\n\n```\ncodex exec \"bump this plugin's version from 1.0.9.10 to 1.0.9.11. Change two places: the Version: header in the main PHP file and the Stable tag in readme.txt. Change nothing else.\"\n```\n\nWith `on-request`\n\n, Codex shows me the diff before applying it, so I confirm the two changes are exactly what I asked for. Then I wrap the release chores into one script:\n\n``` bash\n#!/usr/bin/env bash\nset -euo pipefail\nNEW_VERSION=\"$1\"\n\ncodex exec \"bump the plugin version to $NEW_VERSION in the PHP header and readme.txt Stable tag, nothing else.\"\ncodex exec \"add a $NEW_VERSION section to the top of CHANGELOG.md from recent commits, matching the existing format.\"\ngit diff   # I read this before anything ships\nbash release-prep.sh 1.0.9.11\n```\n\nThe thing that used to be a careful five-minute ritual is now one command and a diff review.\n\nIf `codex exec`\n\nhandles the straight-line work, why keep Claude Code in the loop at all? Because the two are good at different things, and a few patterns only work when you have both.\n\nThe one I use most is cross-model review. I build something with Claude Code, then have Codex review the diff:\n\n```\ncodex exec \"review git diff for security issues and bugs. Cite file and line for each problem. Give findings only, not praise or general impressions.\"\n```\n\nA model reviewing its own output tends to like what it wrote. Hand the diff to a different model and it trips on things the first one walked past as obvious. The instruction to skip praise matters more than it looks. Without it you get \"this looks solid\" followed by a soft non-answer. Ask for problems and locations, nothing else, and the review gets useful.\n\nThe second pattern is extract-the-repeat. I explore a new feature interactively in Claude Code, and somewhere in that mess I notice a step I'm going to do every time. That step gets pulled out into a `codex exec`\n\nline and added to a script. The thinking stays in the conversational tool, the repetition moves to the straight-line one.\n\nThe third I save for changes I can't afford to get wrong: run the same request through both and compare.\n\n```\nclaude -p \"propose a refactor for this function\" > claude.txt\ncodex exec \"propose a refactor for this function\" > codex.txt\ndiff claude.txt codex.txt\n```\n\nIf both land in the same place, I relax. If they diverge, that gap is exactly where a human decision is needed. It's too heavy to do constantly, so it's reserved for the scary diffs.\n\nSwitching tools has a small tax, and how you pay it matters. Claude Code reads `CLAUDE.md`\n\n, Codex reads `AGENTS.md`\n\n. I keep both in the repo with the same conventions so either tool behaves the same way. The trap is updating one and forgetting the other, so changing a convention means editing both, every time.\n\nWhen I move a long task from one tool to the other, I don't dump the whole history across. I have the first tool summarize where things stand, and hand over the summary. These tools can only hold so much at once, so moving the gist instead of the full transcript keeps the second tool sharp.\n\nMoney is part of why two tools beats one here. My rough rule: do the long, exploratory work where it's flat-rate, and the short, mechanical work where metered is cheap anyway. Interactive Claude Code runs inside the subscription. A `codex exec`\n\ncall is small, so even metered it costs little per run.\n\nThis got sharper on June 15, 2026, when Anthropic moved programmatic Claude use, the `claude -p`\n\nheadless path and the Agent SDK, off the subscription and onto separate metered credit. Interactive Claude Code in the terminal stayed on the plan. So scripting Claude with `claude -p`\n\nis no longer a flat-rate move. Which lines up neatly with the split I already had: explore interactively in Claude Code on the flat plan, run short automation through `codex exec`\n\nwhere metered is cheap. Pricing and terms shift, so check the current numbers on each vendor, but the shape of the logic holds.\n\nPut together, a small feature looks like this:\n\n`release-prep.sh`\n\nfor the version bump and changelog.`git diff`\n\none more time, then push.Build, check, tidy, each handed to the tool that's good at it, with judgment and the final read kept in my hands.\n\nRunning two has its own friction, and pretending it doesn't is how you lose the benefit.\n\n`CLAUDE.md`\n\nand `AGENTS.md`\n\nfalling out of sync means the tools start behaving differently. Edit both.Two is not always better than one. If one tool covers it, use one. Reach for both only when the split pays: a separate reviewer, a flat-versus-metered cost difference, a build phase and a repeat phase that genuinely want different strengths.\n\nThe speed is real, and it makes a bad habit tempting: approving diffs without reading them, or running automation with the approval prompt turned off. Treat anything either tool writes as untrusted input until you've read it. Version numbers, config, anything touching user input, get a human diff review before they commit or ship, no matter which model produced them. Keep the approval prompt on outside of contexts you fully understand. The point of automating the boring work is to free up attention, so spend some of it on the review.\n\nThe division held up because it maps to something real: some work is a conversation and some work is a straight line, and the tools are honestly better at one or the other. Build and explore in the conversational one, run and repeat in the straight-line one, let a different model check the first model's work, and put the boring release chores behind a single command. Keep judgment and the last diff for yourself. That's the whole system, and the day one tool covers the job, I'll happily use one.\n\n*I build WordPress plugins and write about AI tooling and security at https://raplsworks.com/.*", "url": "https://wpnews.pro/news/i-run-claude-code-and-codex-side-by-side-here-s-the-division-of-labor-that-works", "canonical_source": "https://dev.to/rapls/i-run-claude-code-and-codex-side-by-side-heres-the-division-of-labor-that-actually-works-4hkg", "published_at": "2026-06-14 04:25:46+00:00", "updated_at": "2026-06-14 04:58:45.722823+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "large-language-models", "ai-agents"], "entities": ["Claude Code", "Codex", "GPT-5.5", "WordPress"], "alternates": {"html": "https://wpnews.pro/news/i-run-claude-code-and-codex-side-by-side-here-s-the-division-of-labor-that-works", "markdown": "https://wpnews.pro/news/i-run-claude-code-and-codex-side-by-side-here-s-the-division-of-labor-that-works.md", "text": "https://wpnews.pro/news/i-run-claude-code-and-codex-side-by-side-here-s-the-division-of-labor-that-works.txt", "jsonld": "https://wpnews.pro/news/i-run-claude-code-and-codex-side-by-side-here-s-the-division-of-labor-that-works.jsonld"}}