cd /news/large-language-models/why-codex-goal-fails-on-complex-work… · home topics large-language-models article
[ARTICLE · art-14133] src=news.ycombinator.com pub= topic=large-language-models verified=true sentiment=↓ negative

Why codex /goal fails on complex workflows: compaction amnesia and context rot

OpenAI's `/goal` feature for Codex suffers from "compaction amnesia" and "context rot," causing significant performance degradation on complex, multi-issue workflows due to flawed context management. A developer created an open-source Rust utility called Nightshift to solve this by isolating tasks into fresh agent sessions with no memory of previous runs, managing state through filesystem and git operations. The tool supports multiple coding agents and aims to provide deterministic scheduling and failure isolation for long-horizon coding tasks.

read3 min publishedMay 26, 2026

Hi HN,

When Openai released /goal earlier this month, I was really excited to try it for long-horizon tasks. But after using it, it didn't blow me away and i did some digging and found a major architectural flaw when using it for complex multi-issue workflows: context rot.

This isn't anything new, but given how openai positioned this feature to developers, i was let down by how they'd implemented context management.

Though /goal is a step forward in long-horizon coding, it lacks task decomposition and proper handling of context - it uses a multi-tier approach that includes persistent context chaining (PCC) to memory, local vector embeddings for RAG, sliding windows, and compaction.

In principle, giving codex a directive of /goal work towards closing my open issues on github should work but this specific execution model hits a fatal wall - Even with massive context windows and RAG, llm reasoning quality degrades significantly beyond 100-150k~ tokens, the agent continues working with worsening performance and finally to prevent token exhaustion it uses compaction to summarize old logs. In practice, this causes compaction amnesia. The model is asked to summarize a massive blob of mixed-relevance information when its reasoning quality is already at its lowest. This compaction leads to forgetting critical constraints, makes way for hallucinations of past decisions, and introduces noise that makes the new context unreliable for long-horizon work.

I wanted to see if enforcing strict outer-loop boundaries would solve this, so I put together an open-source Rust utility called Nightshift (https://github.com/Shaurya-Sethi/nightshift) to test this theory. Instead of running a single long-running session, it isolates the work like this:

  1. You write a PRD as a parent Github issue that defines what needs to be implemented and break it down into vertically sliced child issues with explicit kanban-style dependencies. 2. You run nightshift --prd 1 --agent 3. nightshift utilises gh cli to resolve the dependency graph and pick the next unblocked issue. 4. it syncs the repo, puts together essential context for just that issue and starts a new agent session piping the prd and issue context directly to stdin for the agent to pick up. 5. the agent is now responsible for the usual coding - new feature branch, implementation and testing, pr and self-review, and finally closes the issue. 6. nightshift finds the next unblocked issue after maintaining git hygiene and loops until all issues linked to the prd are resolved.

It's a very simple orchestration. The agent has no memory of previous runs and it doesn't need to - each task is isolated and gets a fresh agent session. The state is managed entirely through filesystem and git operations and you get determinstic scheduling, failure isolation, and robust autonomy.

It currently supports claude-code, codex, cursor, antigravity, and pi coding agent, and im working on adding support for more agents as this project grows. It's totally open-source if you want to inspect how the session management is implemented.

I'd love to hear your thoughts on this and check out your experiments with long-horizon task orchestration. Maybe the way going forward is combining macro management with micro management?

I truly believe that by adding a dynamic task decomposition orchestrator that manages individual agents, /goal would solve half its problems.

Thanks!

Comments URL: [https://news.ycombinator.com/item?id=48275853](https://news.ycombinator.com/item?id=48275853)

Points: 1

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-codex-goal-fails…] indexed:0 read:3min 2026-05-26 ·