Why codex /goal fails on complex workflows: compaction amnesia and context rot

wpnews.pro

cd /news/large-language-models/why-codex-goal-fails-on-complex-work… · home › topics › large-language-models › article

[ARTICLE · art-14133] src=news.ycombinator.com ↗ pub=2026-05-26T06:33Z topic=large-language-models verified=true sentiment=↓ negative

Why codex /goal fails on complex workflows: compaction amnesia and context rot

OpenAI's `/goal` feature for Codex suffers from "compaction amnesia" and "context rot," causing significant performance degradation on complex, multi-issue workflows due to flawed context management. A developer created an open-source Rust utility called Nightshift to solve this by isolating tasks into fresh agent sessions with no memory of previous runs, managing state through filesystem and git operations. The tool supports multiple coding agents and aims to provide deterministic scheduling and failure isolation for long-horizon coding tasks.

read3 min views14 publishedMay 26, 2026

Hi HN,

When Openai released /goal earlier this month, I was really excited to try it for long-horizon tasks. But after using it, it didn't blow me away and i did some digging and found a major architectural flaw when using it for complex multi-issue workflows: context rot.

This isn't anything new, but given how openai positioned this feature to developers, i was let down by how they'd implemented context management.

Though /goal is a step forward in long-horizon coding, it lacks task decomposition and proper handling of context - it uses a multi-tier approach that includes persistent context chaining (PCC) to memory, local vector embeddings for RAG, sliding windows, and compaction.

In principle, giving codex a directive of /goal work towards closing my open issues on github should work but this specific execution model hits a fatal wall - Even with massive context windows and RAG, llm reasoning quality degrades significantly beyond 100-150k~ tokens, the agent continues working with worsening performance and finally to prevent token exhaustion it uses compaction to summarize old logs. In practice, this causes compaction amnesia. The model is asked to summarize a massive blob of mixed-relevance information when its reasoning quality is already at its lowest. This compaction leads to forgetting critical constraints, makes way for hallucinations of past decisions, and introduces noise that makes the new context unreliable for long-horizon work.

I wanted to see if enforcing strict outer-loop boundaries would solve this, so I put together an open-source Rust utility called Nightshift (https://github.com/Shaurya-Sethi/nightshift) to test this theory. Instead of running a single long-running session, it isolates the work like this:

You write a PRD as a parent Github issue that defines what needs to be implemented and break it down into vertically sliced child issues with explicit kanban-style dependencies. 2. You run nightshift --prd 1 --agent 3. nightshift utilises gh cli to resolve the dependency graph and pick the next unblocked issue. 4. it syncs the repo, puts together essential context for just that issue and starts a new agent session piping the prd and issue context directly to stdin for the agent to pick up. 5. the agent is now responsible for the usual coding - new feature branch, implementation and testing, pr and self-review, and finally closes the issue. 6. nightshift finds the next unblocked issue after maintaining git hygiene and loops until all issues linked to the prd are resolved.

It's a very simple orchestration. The agent has no memory of previous runs and it doesn't need to - each task is isolated and gets a fresh agent session. The state is managed entirely through filesystem and git operations and you get determinstic scheduling, failure isolation, and robust autonomy.

It currently supports claude-code, codex, cursor, antigravity, and pi coding agent, and im working on adding support for more agents as this project grows. It's totally open-source if you want to inspect how the session management is implemented.

I'd love to hear your thoughts on this and check out your experiments with long-horizon task orchestration. Maybe the way going forward is combining macro management with micro management?

I truly believe that by adding a dynamic task decomposition orchestrator that manages individual agents, /goal would solve half its problems.

Thanks!

Comments URL: [https://news.ycombinator.com/item?id=48275853](https://news.ycombinator.com/item?id=48275853)

Points: 1

source & further reading

news.ycombinator.com — original article Ask HN: What was the last task where only a frontier model could do it? Where Is Karpathy? Ask HN: Do you still trace new codebases manually?

~/api · this article 200

$curl api.wpnews.pro/v1/news/why-codex-goal-fails-on-…

Read original on news.ycombinator.com → news.ycombinator.com/item?id=48275853

mentioned entities

OpenAI

Codex

metadata

slugwhy-codex-goal-fails-on-complex-workflows-compaction-amnesia-and-context-rot

topic#large-language-models

secondary4 topics

sentimentnegative

canonicalnews.ycombinator.com

navigation

← prevAI Gurus Are Charging Wall Stree…

next →AI Making Work Easy for Data Ana…

── more in #large-language-models 4 stories · sorted by recency

cryptobriefing.com · 10 Jul · #large-language-models

Cursor develops SAND, a general-purpose AI agent to rival ChatGPT and Claude

the-decoder.com · 10 Jul · #large-language-models

OpenAI's GPT-5.6 Sol autonomously post-trained the smaller Luna model with a "fairly underspecified prompt"

cryptobriefing.com · 10 Jul · #large-language-models

Grok 4.5 launches with aggressive pricing, undercutting Anthropic and OpenAI by over 60%

cryptobriefing.com · 10 Jul · #large-language-models

Meta launches AI image generator that automatically opts in every public Instagram account

── more on @openai 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required