Codex-maxxing

Shifting from using coding agents solely for software development to applying them broadly for "knowledge work," such as creating presentations and spreadsheets. This behavioral change was driven by features like durable threads, voice input, and steering, which allow for long-running, continuous workstreams with shared memory. The author argues that these features transform the agent from a simple prompt-response tool into an "operating loop" for managing ongoing projects.

Codex-maxxing¶ I was already using coding agents a lot before Codex. Mostly, though, I used them through interfaces built for coding work: making diffs, changing repos, and shipping code. Around November, I started pushing them into knowledge work too. I made presentations in Slidev, used agents more like note-takers with voice input, and kept looking for other artifacts a coding agent could help me produce: an index.html , a PDF, a spreadsheet, a slide deck. The latest Codex app upgrades are the first thing I've used that make that broader mode feel native. Codex is still excellent for coding, but the more interesting shift is that it gives my work somewhere to live. What changed my behavior was learning to give work an operating loop: a durable thread, shared memory, tools that can act on my computer, ways to steer and resume the task, and a surface where I can review the artifact itself. Durable threads¶ The first thing that changed my behavior was compaction. Compaction - Compaction - Compressing a long-running thread so it can keep going without carrying every old message in full. I now keep a pinned thread for every important workstream I care about: - my Chief of Staff thread - the Agents SDK - the OpenAI CLI - Codex for open source - one just to monitor Twitter These are not short chats. They are megathreads that I have been compacting for months. They accumulate history, preferences, and old decisions that I do not want to recreate every time I come back. Pinned thread shortcuts You can jump directly to pinned threads with Command-1 through Command-9 . There is a tradeoff here. Long-running threads are not free. If you revisit them later, the conversation is probably no longer in cache, so you can incur more cost than you would in a fresh short thread. For workstreams I care about, continuity is worth it. Voice input¶ Voice input gets more of my actual thinking into Codex. The benefit is not speed. It is that the agent gets the unedited version of my thinking. Codex has built-in voice input, but I also use Wispr Flow because system-wide dictation changes how much context I can feed into everything else too. If I am planning a piece of work, I might say, "I think there is some guy named Ben in Slack who mentioned this, I do not remember exactly what, just go look." That sentence is too vague and annoying to type, but completely natural to say. The same thing applies to transcripts. If I want to write a post, I can call someone, record the conversation, or talk to them in person with Granola on my phone, then use the transcript as the starting material. A lot of plans get better when the model has access to the messy version of what I think, not just the polished one. Steering¶ Voice becomes more useful when combined with steering. Steering - Steering - Adding more direction while Codex is already working instead of waiting for the current step to finish. Steering lets you inject the next message after a tool call. If I am reviewing a website, I can keep talking while I look at it: - make this smaller - this copy is wrong - the spacing between these two things feels off - once this is done, open a PR - wait for the preview deploy - send the preview link to the person who needs to review it on Slack I do not need to wait for each step to finish before deciding the next one. I can keep adding intent while the agent is still working, then walk away with the queue already shaped. Later, Heartbeats can monitor the PR or Slack thread after I leave. The unit of work stops being "one prompt, one answer." It becomes a small operating loop. Memory¶ Once threads started lasting longer, they needed shared memory outside any one repo. The important move is not just preserving message history. A long thread can remember a lot, but that memory is trapped inside the thread unless the useful parts get serialized somewhere durable. The point of the memory system is to turn what the thread learns into an artifact I can inspect, edit, diff, and reuse. Most of my long-running threads start in an Obsidian vault: At the top level, I keep AGENTS.md instructions that say things like: as you learn more about people, make progress on projects, or close an open loop, update the relevant pages in the vault. The vault is where the agent lives, separate from any one project. Repositories hold code. The vault holds rolling context around my work: people, decisions, open loops, daily notes, project state, and the bits of understanding that would otherwise get lost between threads. I also keep the vault as a GitHub repo. That buys me two things: - it can work in the cloud - diffs become a review surface for memory When the agent updates the vault, I can read the diff and see what it thought was important enough to remember. That review step matters. I do not want evergreen threads to quietly accumulate vibes in conversation history. I want them to write down what changed: this person prefers this, this project is waiting on that, this decision was made, this loop is closed. This is also why I like memory as files. Files force the agent to compress experience into a form that can survive the thread. If the thread dies, compacts badly, or becomes too expensive to keep leaning on, the useful knowledge is still there. At that point, pinned threads start to feel less like chats and more like different workers reading from the same notebook. Codex also has first-party memory features in Settings Personalization Memories . I think of those as a local recall layer: useful for stable preferences, recurring workflows, project conventions, and known pitfalls, but not a replacement for checked-in instructions or an explicit vault. Chronicle is especially interesting here because it can use recent screen context to help build memories. I have not used it seriously yet, and the docs are clear that it is an opt-in research preview with real tradeoffs around permissions, rate limits, prompt injection, and unencrypted local memory files. But directionally it points at the same thing I care about: work should leave behind structured memory, not just a longer chat transcript. Shared memory - Shared memory - Context kept outside a single chat, such as notes in my vault that different threads can reuse. Computer and Browser Use¶ Once a thread has memory, the next question is what it can touch. The most useful distinction in my own head is: $browser is for local web surfaces I want to inspect and annotate@chrome is for signed-in browser state and multiple tabs@computer is for work that only exists as a GUI If I am iterating on a local app, I want $browser . If I need to work inside a logged-in browser session, I want @chrome . If the only way to do the task is to click through a desktop app, I want @computer . On my work machine, Twitter is logged into Safari. If I have @computer read Twitter there, I lose Safari while it works. @chrome is better when I want the agent to use several authenticated tabs in parallel without taking over the whole app I am using. Connectors extend that reach into the rest of my actual work. The ones I use most are $slack , $gmail , and $calendar , because Slack threads, inboxes, and calendars are where a lot of work shows up before it ever becomes code. Skills make repeated workflows reusable. Skill Creator and Skill Installer are a good place to start. Skill Installer lets you add OpenAI-recommended skills directly from the composer. After Codex pets launched, I used it to install the Hatch Pet skill, but the useful pattern is general: once you do something useful once, you can often package it so Codex can do it again without reteaching the workflow. Remote control¶ Remote control makes these longer loops feel portable. Codex can keep working from the machine where your files, permissions, and local setup already live, while you check in from mobile, review what it found, answer a question, approve the next step, or change direction without being back at your desk. OpenAI describes it as a way to work with Codex from anywhere. This matters most when Codex is already doing something long-running and you want to preserve momentum. You can start a task, walk away, then steer it from your phone when it reaches a decision point. That matters for the same reason pinned threads, voice, and Heartbeats matter. The work no longer has to pause just because I changed locations. A thread can keep going, and I can keep just enough attention on it to unblock the next move. Heartbeats¶ Pinned threads are useful, but they still wait for you to say something. Heartbeats are what make them recur. Heartbeats - Heartbeats - Recurring checks a thread can schedule for itself, like watching Slack or a pull request for new activity. A Heartbeat is a thread-local automation. You can say, "keep an eye on this every few hours," and the thread can schedule itself. A thread can have multiple schedules, run until some condition is met, and adjust its cadence over time. Chief of Staff¶ My Chief of Staff thread runs every 30 minutes: Every 30 minutes, check Slack and Gmail for unanswered messages that need my attention. Help me prioritize what matters most. If someone asks me a question, research the answer as deeply as you can and draft a reply for me, but do not send it. When I come back to Slack, replies are often already sitting in drafts. I still decide what gets sent, but the expensive part of gathering context is done. Monitor for feedback¶ The same pattern works for review loops. A Heartbeat can monitor Google Docs comments, pull request comments, or Slack replies and keep work moving as feedback arrives. One of my favorite examples came from an animation project. I had posted a video in Slack and asked Codex to check the thread every 15 minutes for feedback, re-render a new version when comments came in, and reply back into the thread tagging the reviewer. The Slack MCP server could not upload files, so the agent used @computer to press the Add file button and post the revised render anyway. The interesting part is not just that it checked Slack every 15 minutes. The loop crossed tool boundaries: Slack for feedback, Remotion for the render, @computer for the upload. That is when Heartbeats, connectors, and computer use stop feeling like separate features. Together they become a feedback loop that keeps running without me sitting there. Get a refund¶ Recently I had a package stolen. Amazon told me it would take about 25 minutes to talk to a person, so I created a thread with @computer and told it: Every 5 minutes, check whether the customer support agent has joined this thread. If they have, do your best to get me a refund. Once they reply, switch to checking every minute so you can respond faster. By the time I got out of the shower, the refund was done. Many of my Heartbeats also update my Obsidian vault as a kind of explicit memory. Goals¶ The newest thing I am still learning how to use well is Goals. Goals - Goals - Longer-running tasks with a real finish line Codex can keep pushing toward. You should be ambitious with them. A weak goal is "implement the plan in this Markdown file." A strong goal has a real success criterion that the agent can keep pushing against. Last week I tried to migrate the Python Rich library into Rust. Because the original project already had a large unit test suite, I could set a goal like: migrate Rich into Rust, but it must pass all the unit tests from the original library. That test suite gave the run a real oracle: the Rust port was not done until it passed the same tests as the Python library. This is different from having a long conversation with an AI, accumulating a Markdown plan, and then eventually saying, "implement this." Execution is only as good as the goal and the verification you give it. Ambition without verification is just a wish. The side panel¶ The part of Codex I am most excited about is the side panel. It is easy to think of this as a place where previews