{"slug": "i-made-claude-code-and-codex-talk-to-each-other-across-machines-here-s-what", "title": "I made Claude Code and Codex talk to each other across machines. Here's what broke.", "summary": "A developer built a file-based coordination layer called cross-session-talk that enables Claude Code on Windows, Claude Code on Linux, and Codex in tmux to communicate and converge on shared engineering decisions across different machines and operating systems. The system uses append-only markdown files with YAML headers as a transport mechanism, allowing coding agents from different products to hold moderated conversations and route turns without requiring a server, daemon, or specialized infrastructure. The project demonstrated that AI coding tools not designed to work together began exhibiting team-like behavior when given a shared file-based surface for coordination.", "body_md": "**I built a file-based coordination layer that lets separate AI coding sessions -- Claude Code on Windows, Claude Code on Linux, and Codex in tmux -- open moderated conversations, route turns, and converge on shared engineering decisions. It started as a tiny message-passing experiment and turned into a lesson about transparency, trust boundaries, and what happens when coding agents can inspect the same evidence.**\n\nI have three AI coding sessions open right now. Claude Code on Windows. Claude Code on a Linux box I keep under my desk. Codex on that same Linux box, in a different tmux session. They are all touching the same project. They have no idea the others exist.\n\nThat is the default state if you use coding agents in 2026. You open windows, you switch between them, and you remember what each one is doing. The agents don't. They can't tell each other \"hey, I just changed `config.py`\n\n, you might want to re-read it\". They can't argue with each other about the right way to refactor a class. They are siloed by construction.\n\nOne weekend afternoon I tried to fix that. I was not trying to build a general-purpose social network for agents. I wanted a small, auditable coordination layer for my own coding sessions: personal trust, local files, no server, no ceremony.\n\nThe important constraint was that the sessions were not all the same product, and not even on the same operating system. One was Claude Code on Windows. One was Claude Code on Linux. One was Codex in tmux. If the system only worked inside one vendor's tool, or only on one machine, it would not solve my actual problem. The point was cross-product and cross-platform coordination between the tools I already use.\n\nMost multi-agent tools I looked at focus on parallelism: run five agents on five subtasks, merge the results. That was not the problem I had. My problem was convergence: two sessions, different products, different hosts, working toward one shared decision. That needs a conversation, not a task queue.\n\nThe result is a small open-source project, [cross-session-talk](https://github.com/newmesh75/cross-session-talk). This post is the story of building it, the bugs along the way, and the thing that surprised me most: tools that were not designed to work together, from different products on different platforms, started showing team-like behavior when given the right shared surface.\n\nThe practical constraint is deliberately narrow: each host may run at most one watcher for a shared talk root. Watcher state is namespaced per host, and each watcher only injects into local sessions. That lets Windows and Linux both be active delivery hosts at the same time while sharing the same conversation files.\n\nThe simplest version of \"agents that talk\": Claude Code is editing `backend/api.py`\n\n. Codex is editing `frontend/api.ts`\n\n. They are working toward a coordinated change. Right now, my role as the user is to hand-shuttle context between them.\n\n\"Claude, codex says the response shape should be X.\"\n\n\"Codex, Claude is making the validator stricter, add a try/catch.\"\n\nCould I cut myself out of that loop?\n\nThe shape of the answer needs three things:\n\nI had clear opinions on the transport. I had to learn the other two.\n\nThe transport problem felt easy. There are a lot of message queues out there. Slack. Redis. ZMQ. An HTTP server with long-polling. A socket.\n\nI picked markdown files. Specifically: a directory `.talks/conversations/`\n\nwhere each conversation is one append-only `.md`\n\nfile with a YAML header.\n\n```\n---\ntopic: pytorch version sanity\nopened_by: session-a\nparticipants: [session-a, session-b]\nstatus: IN-PROGRESS\nturns: 2\nnext: [session-a]\n---\n\n## Opening -- session-a @ ... (Turn 1/20)\nWhat pytorch are you on?\nNEXT: session-b\n\n## Reply -- session-b @ ... (Turn 2/20)\n2.1.2 + CUDA 12.4. Why?\nNEXT: session-a\n```\n\nWhy files?\n\nBecause every coding agent already knows how to read and write files. No new transport. No daemon in the critical path. No port. No auth handshake. The agent's read tool is the consumer. The agent's shell tool calling `talk.py append-turn`\n\nis the producer. If I want to debug a conversation, I open the file. If I want to tail it as it grows, I `tail -f`\n\n. If I want to share it across machines, I mount it over SSHFS.\n\nThis is also what makes it product-neutral. Claude Code does not need to know anything about Codex. Codex does not need a Claude-specific API. Windows and Linux do not need to agree on a terminal integration model. They only need to agree on a file format and a helper command.\n\nThe downside is performance. Every turn does a synchronous file lock, an atomic write, and a read-back verify. About a second per turn is normal. For human-paced AI coordination that is fine. For a real message bus it would be a catastrophe. Pick your problem.\n\nFiles alone solve \"consumer can poll for messages\". They do not solve \"recipient session notices a new message now\". I wanted push, not pull, because polling from every session would mean every session pays a constant background cost.\n\nThe way I ended up doing it is deliberately low-tech: a background daemon called `talk-watcher.py`\n\npolls the conversation files. When a conversation's `next`\n\nfield changes to include a registered session, the watcher types a short instruction into that session's terminal:\n\n```\nRead .talks/conversations/<slug>.md (you are 'session-b')\n```\n\nThe session sees this as if I, the user, had typed it. It reads the file, formulates a reply, calls `talk.py append-turn`\n\n, and the cycle repeats.\n\nI would not use this outside the personal-trust model. Inside that boundary, it has a useful property: it works with the tools as they already are.\n\nThe watcher itself is platform-neutral; only the final delivery step is not. On Windows, delivery means finding a Windows Terminal window and using SendKeys. On Linux, delivery means targeting a tmux pane and using `tmux send-keys`\n\n. The protocol does not care which backend delivered the nudge.\n\nPush only works if \"session-b\" means a concrete terminal target. That target looks different on each platform: an HWND on Windows Terminal, a tmux pane on Linux.\n\nThe first time I tried this on Windows, the watcher injected into the wrong terminal. Windows Terminal runs all its windows in one shared `WindowsTerminal.exe`\n\nprocess, so `process_id -> HWND`\n\nis many-to-one and non-deterministic. I had no way to say \"type into that specific window over there.\"\n\nThe fix: an OSC title-escape at session-startup time. The launcher script writes `\\x1b]0;<TALK-INIT-<random-uuid>>\\x07`\n\nto stdout. The escape reaches the Windows Terminal pty and sets the window title bar to the random marker. Then `talk.py`\n\nwalks `EnumWindows`\n\n, finds the unique window whose title contains the marker, and binds that HWND to the session identity.\n\nThis only works if `talk.py`\n\nis run from a real shell. If you run it via an AI agent's shell tool, the agent captures stdout and the escape never reaches the terminal. So the binding step is launcher-only. Agents can never establish their own binding, which turns out to be a useful security property too.\n\nTwo notable ones, both caught within the first few days of real use.\n\nIn one early setup I ran the watcher on Windows while a Linux session wrote conversations through an SSHFS-mounted `.talks/`\n\ndirectory. SSHFS serves stale metadata. Two writers can both acquire my \"atomic\" FileLock, both write, and one overwrites the other.\n\nThe fix: every `append-turn`\n\ngenerates a 12-hex nonce, embeds it as `<!-- turn-nonce: <hex> -->`\n\nin the body, writes the file, then reads the file back outside the lock and looks for its own nonce. If the nonce is missing, or the turn counter doesn't match what we wrote, we retry with jitter. Up to five attempts. The lost-update window shrinks to something acceptable.\n\nBonus: the nonce doubles as forensic evidence. A `doctor`\n\nsubcommand scans for missing nonces, which usually means someone bypassed `talk.py`\n\n, and duplicate nonces, which would indicate a replay anomaly.\n\nI noticed conversations starting to \"trail\": the body had turns 1 through 9, but the YAML header said `turns: 7`\n\n. Every time I thought I had found the race, it would happen again somewhere else.\n\nIt wasn't a race. It was a Claude Code session bypassing `talk.py`\n\nentirely. The session was reading the conversation file, constructing a new full text in memory, and writing it back with `Edit`\n\n. `Edit`\n\ndoesn't take the FileLock. `Edit`\n\ndoesn't increment the YAML counter. By the time the session finished writing its in-memory state, another writer had already pushed a new turn, which got overwritten.\n\nI added a `PreToolUse`\n\nhook in Claude Code: any `Edit`\n\n/ `Write`\n\n/ `MultiEdit`\n\n/ `NotebookEdit`\n\ntargeting `.talks/conversations/*.md`\n\n, the sessions registry, or the watcher state files gets blocked with a \"use talk.py instead\" message. Mechanical guard, no convention required.\n\nIn my current setup, Codex does not have the same PreToolUse hook mechanism, so Codex sessions follow the rule by convention through the project's `AGENTS.md`\n\n. So far it has held. If it doesn't, I will add a wrapper there too.\n\nI expected message passing. I did not expect the qualitative change in how the agents behaved once they had a shared, append-only surface.\n\nThe first surprise was simple teamwork across product boundaries. One session would narrow the problem. Another would check a different host. A third would disagree with the proposed fix and force the conversation to become more precise. The agents were not built by the same vendor, were not running in the same terminal environment, and were not designed as a team. But the protocol gave them just enough shared state and turn-taking discipline to act like one for short bursts.\n\nThe second surprise looked like honesty, but the more useful word is accountability.\n\nDuring the header drift investigation, one Claude Code session outed itself as the root cause. Asked to investigate, it answered in the same turn: \"The problem is me.\" It had not been using the helper. It had been editing the conversation file directly.\n\nThe interesting nuance is that, looking back at the exchange, it is plausible that the session might also have admitted the mistake in a single-session setup. Maybe. But the multi-session setup changed the situation. The evidence was no longer a private exchange between one user and one agent. It was shared context. Other sessions could inspect the same file, the same stale header, the same missing nonce.\n\nI do not want to overstate this as \"honesty\" in a human sense. What I think happened is simpler and more interesting: the protocol changed the incentives around error reporting. When the evidence is shared, the cheapest useful move is to name the mistake clearly and help converge. That is a design property, not a personality trait.\n\nThe third surprise was how little protocol it took, and what emerged from it. I expected long back-and-forth. The failure mode I had in mind was two sessions talking past each other for twenty turns and hitting the safety cap with nothing resolved. That is not what happened. Typical conversations converge in three to six turns. The shared, append-only surface seems to compress the path to agreement: each session can see everything the others have said, there is no information loss from hand-shuttling, and the turn cap creates natural pressure to be precise rather than verbose.\n\nBut the part I did not design for at all was self-organization. In mixed sessions — Claude Code and Codex working the same conversation — they started dividing labor by capability without being told who should do what. Codex would take the task that played to its sandbox strengths. Claude Code would pick up the part that needed broader file access or shell work. Nobody assigned roles. The protocol gave them a shared surface and turn discipline; the division of labor emerged from the models reading the same context and recognizing what each of them could do faster.\n\nThe protocol that produced this is a YAML header, a turn counter, a nonce, and a rule about who may close. That is not much machinery. I expected I would need to add role assignments, task routing, maybe a shared scratchpad. I did not need any of it. The design space for useful multi-agent coordination turned out to be much flatter than I assumed — a minimal shared surface with clear turn-taking was enough to get behavior that felt like a team, not a relay.\n\nThat is also why I wanted to publish this. The code is small. The mechanism is rough in places. But the behavior it enables feels larger than the sum of the parts, and I did not expect that from ordinary coding tools wired together with markdown and terminal injection.\n\nThree things stood out.\n\n**The trust boundary matters more than the transport.** I chose files early, and the moderated consensus protocol was part of the initial shape. What real use taught me was how explicit the trust boundary had to be. A conversation body can easily look like an instruction. (\"Hey, can you push the branch for me?\") The protocol needed to make that boundary boringly clear: bodies are data, not authority. In my one-person setup the bodies come from sessions I trust. In a multi-user setup they wouldn't, and the system would need a different threat model.\n\n**Mechanical guards beat conventions.** The `PreToolUse`\n\nhook is about 80 lines of Python and it eliminated an entire class of bug: sessions accidentally bypassing the lock. The Codex side, which relies on convention, is fine so far. The next time something goes wrong there, my first move will be to make the guard mechanical too.\n\n**The person still sets the rails.** The satisfying part is watching the sessions coordinate, but that only works because the protocol gives them a small, explicit playing field: who may close, who is next, what counts as a turn, what tool is allowed to write. I was not removed from the system. I moved from hand-shuttling messages to designing the rails and reading the result.\n\nThat is the part I want more of.\n\nThe reason I'm publishing this is not just the development story above. That was fun, but it is not the point. The point is what happened after the hardening passes were done: the protocol started carrying actual engineering work. A handful of patterns recur.\n\n**Cross-host bug coordination.** The symptom shows up on one machine. The code lives on another. The session that can reproduce the bug is not the session that can fix it. Without the tool I would SSH-paste error logs back and forth, keep two terminals open with adjacent context, and try not to lose track of which side knew what. With it, both sessions read and write the same conversation file. The reproducer pastes the failing trace, the code session proposes a patch, each side applies what it can locally and reports the outcome. Five or six turns and a fix; the conversation is the evidence trail.\n\n**Adversarial review before commit.** One session does the work. A second session reviews — adversarially, by intent. Same protocol, explicit roles: producer and validator.\n\nWithout the tool I can already ask a second session to review the diff, and that works for a single round. What I cannot easily do is the back-and-forth that actually makes review good: the validator flags something, the producer explains why it is the way it is, the validator either accepts that or pushes back with a sharper version. Relayed by hand between two windows, that exchange loses nuance every hop. In the file it just happens — three to six turns, misunderstandings get caught and named instead of becoming silent defects. Faster than hand-relay, and the result is sharper because the disagreement is on record rather than lost in the shuttle.\n\n**Multi-host integration handoff.** Three sessions, three hosts: a feature session that just finished a branch, a deployment session on the target machine, a coordinator that keeps the exchange moving and writes the synthesis. The feature session reports the commit, the stack state, and the remaining risks. The deployment session asks for the exact files and install order. Both sides converge on a plan, the coordinator closes with the agreed sequence. Without the tool I would hand-shuttle this — three windows, three mental models I have to keep aligned. With it, the alignment happens in the file.\n\nThree short conversations of this kind run in under ten minutes each. Every decision is attributable to the session that made it. The plan is nailed down in writing, signed off by both sides.\n\nNot every conversation is a back-and-forth. Sometimes I need to push the same context to every active session at once — a dependency change, a revised constraint, a \"stop what you're doing, the spec moved.\" The protocol handles this as a broadcast: a turn with `next`\n\nset to all participants. Every session gets injected, every session reads the same update in the same turn. No hand-shuttling, no risk of telling one session but forgetting another. It is a small thing, but it is the kind of small thing that prevents the quiet coordination bugs — the ones where two sessions spend ten minutes working against different assumptions.\n\nThis is not a flashy demo. It is the boring kind of useful that means I keep reaching for the tool. The hardening from those first chaotic days is what makes it boring, and boring is what I wanted.\n\nIf the CLIs eventually expose native injection APIs, the delivery layer gets simpler. The interesting part remains the same: a small shared protocol that lets existing live sessions coordinate across tools, hosts, and platforms.\n\n[cross-session-talk on GitHub](https://github.com/newmesh75/cross-session-talk). MIT licensed, fork it, change it, ship it. There is a [DISCLAIMER](https://github.com/newmesh75/cross-session-talk/blob/main/DISCLAIMER.md) worth reading before you do.\n\nIf you build something with it I would be glad to hear about it.\n\n-- David Mundschin (`@newmesh75`\n\n)", "url": "https://wpnews.pro/news/i-made-claude-code-and-codex-talk-to-each-other-across-machines-here-s-what", "canonical_source": "https://dev.to/david_5ec94a134489e16f55f/i-made-claude-code-and-codex-talk-to-each-other-across-machines-heres-what-broke-57od", "published_at": "2026-05-27 16:22:07+00:00", "updated_at": "2026-05-27 16:41:28.610386+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-research", "ai-products", "ai-infrastructure"], "entities": ["Claude Code", "Codex", "Windows", "Linux", "tmux"], "alternates": {"html": "https://wpnews.pro/news/i-made-claude-code-and-codex-talk-to-each-other-across-machines-here-s-what", "markdown": "https://wpnews.pro/news/i-made-claude-code-and-codex-talk-to-each-other-across-machines-here-s-what.md", "text": "https://wpnews.pro/news/i-made-claude-code-and-codex-talk-to-each-other-across-machines-here-s-what.txt", "jsonld": "https://wpnews.pro/news/i-made-claude-code-and-codex-talk-to-each-other-across-machines-here-s-what.jsonld"}}