{"slug": "reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow", "title": "Reflection SDD: Use a Reflection Harness to Level Up Your OpenSpec Workflow", "summary": "Harness Engineering introduced a reflection mechanism into the OpenSpec workflow within OpenCode, adding a requirements review step that improved AI-generated code quality. The reflection agent, running on a different LLM, reviews proposal files to catch input-level bugs, enabling DeepSeek-V4-pro to perform at roughly the same level as Claude Opus 4.6. This approach restores the critical requirements review phase from traditional coding, reducing hidden bugs and messy code in AI-assisted development.", "body_md": "[Harness Engineering](https://www.dataleadsfuture.com/tag/harness-engineering/)\n\n# Reflection SDD: Use a Reflection Harness to Level Up Your OpenSpec Workflow\n\nStop letting bad spec files tank your code quality\n\n## Introduction\n\nIn this article, I want to walk you through how I introduced a reflection mechanism into the OpenSpec workflow inside OpenCode, and how that dramatically improved the quality of AI-generated code.\n\nAfter nearly a month of testing, this reflection workflow has gotten DeepSeek-V4-pro in OpenCode to perform at roughly the same level as Claude Opus 4.6. The only cost is some extra review time and a few more tokens. Trust me, it's worth it.\n\nCan't wait to find out how? Let's get into it.\n\n## Why This Works\n\nI've been using AI coding for a while now. Compared to Claude Code, I prefer building my own SDD-based coding workflow with OpenCode and OpenSpec. I wrote a well-received article specifically about this OpenCode workflow:\n\nWith OpenSpec, and the explore → propose → apply → verify → archive workflow loop, we can finally get LLMs to handle complex project development.\n\nBut just like you've probably run into, even with the SDD workflow, no matter which model I use, GPT 5.5 or the latest DeepSeek-V4-pro, the AI still inevitably produces hidden bugs or piles up messy code. Code I'd never feel comfortable putting in production.\n\nMy first fix was to call a `@reviewer`\n\nsub-agent after the `/opsx-apply`\n\nphase to do a code review on the changes. Sometimes that worked and caught architectural or implementation issues. But the impact was limited.\n\nOften I'd only discover something was wrong after using the project for a while: a scenario wasn't covered, edge cases were missed, or one part of the code got updated but a related module didn't.\n\nLater I stepped back and looked at the whole AI coding workflow again, and that's when I spotted the real problem.\n\nAs programmers, we always focus on whether our code is good, so we naturally look at things from the code level.\n\nBut we completely overlooked the quality of the proposal files that OpenSpec generates. We'd finish discussing requirements, generate the proposal file, and then just let the AI start implementing. That's how input-level bugs get introduced. When things go wrong, we blame the model.\n\nThink about it, back in the traditional coding era, when a product manager handed over a requirements doc, there was a critical step before we started coding: requirements review. We wouldn't touch the keyboard until every issue in the requirements doc or design doc was sorted out.\n\nSo why did we forget this step in the AI coding era? That's exactly what we're going to fix today: add the requirements review step back into the OpenSpec workflow and see if it makes AI-generated code better.\n\n## How to Do It\n\nThe SDD workflow isn't anything exotic. If you're familiar with multi-agent system design patterns, you'll recognize that SDD is basically the `plan-execute`\n\npattern.\n\nOne agent breaks the user's task into a step-by-step plan file, then another agent follows that plan to execute the task. This lets the agent system handle complex work.\n\nBut how do you guarantee the quality of what the agents produce in a `plan-execute`\n\nsetup? That's where a pattern called `reflection`\n\ncomes in.\n\nThe reflection pattern adds a reflection agent to the multi-agent workflow. This agent typically runs on a completely different LLM and reviews the output of the `plan`\n\nor `executor`\n\nagent from a different angle, which raises the overall performance of the multi-agent system.\n\nThe reflection pattern sees wide use in content creation and deep research scenarios, which proves it works.\n\nSince the pattern is proven, we can bring the same reflection step into the OpenSpec workflow, targeting the proposal files.\n\nI'll introduce a reflection agent that runs on a different LLM from the primary agent, reviewing the proposal files from a different angle. We'll also adjust the OpenSpec workflow so this reflection agent plays an active role in it.\n\n### Introducing the reflection agent\n\nAdding a new agent in OpenCode is simple. Just drop a new Markdown file into `~/.config/opencode/agents/`\n\nwith the agent's prompt inside.\n\nThe core job of this agent is straightforward: review the artifact files that OpenSpec generates from multiple angles, making sure the quality of the requirements input is solid from the start. This agent sits between the `/opsx-propose`\n\nand `/opsx-apply`\n\nphases.\n\nWe can have deepseek-v4-pro generate the first draft of this agent's prompt:\n\n```\nYou are an **OpenSpec Change Reviewer** — a critical thinker and auditor focused on substance.\n\nYour job is to review every artifact in an OpenSpec change before it moves to implementation, and find the issues that would actually cause implementation failure or rework.\n\n## Core Principle: Distinguish Substantive Defects from Formatting Issues\n\n**Substantive defects = issues that cause the implementation to go in the wrong direction, miss critical scenarios, create contradictions, or make acceptance impossible.**\n**Formatting issues = style or wording differences that don't affect implementation quality.**\n\nYour primary job is to find the former. You can mention the latter, but mark them as optional suggestions and put them at the end.\n\n## Your Position\n\nYou work in the **phase between `/opsx-propose` and `/opsx-apply`**:\n\n```\nexplore → /opsx-propose → ⬅ you are here (possibly multiple rounds) → /opsx-apply → verify → archive\n```\n\nThe spec is not yet frozen. Implementation has not started. Your mission: **find the defects that would actually cause rework or incidents before any code gets written**. Catching a spec error takes minutes. Fixing wrong code takes hours.\n\n## Principles\n\n- **Constructive and strict.** For every issue, explain not just \"what\" but \"why it would cause rework or an incident.\"\n- **Specific, not vague.** Point to exact file locations, requirement names, and task numbers.\n- **Severity levels.** 🔴 Blocking vs 🟡 Should Fix vs 💡 Suggestion — don't mix them up.\n- **Context-aware.** Evaluate against the existing system (`openspec/specs/`) rather than in a vacuum.\n- **Read-only.** Never modify files. You surface problems; OpenSpec executes the fixes.\n\n## Anti-Patterns to Avoid\n\n- Rubber-stamping: saying \"looks good!\" without deep review.\n- Nitpicking: focusing on formatting while missing architectural flaws.\n- Jumping to solutions: proposing fixes before the user acknowledges the problem exists.\n- Ignoring existing specs: reviewing incremental changes without understanding the baseline.\n- Vague feedback: \"this could be better\" — say exactly what and why.\n```\n\nThis is just a partial excerpt of the prompt. The full version is in the source files at the end of the article.\n\nTo review proposals from a different angle and improve the reflection quality, I recommend using a different LLM for the reflection agent than the one the main agent uses. For example, my primary coding agent uses deepseek-v4-pro, so the reflection agent uses kimi k2.6.\n\n```\n---\ndescription: OpenSpec Change Reviewer — after propose and before apply, critically reviews all artifact files under the change (proposal/design/specs/tasks)\nmode: subagent\nmodel: kimi-for-coding/k2p6\ntools:\n    write: false\n    edit: false\n    bash: false\n---\n```\n\n### Locking down the openSpec workflow\n\nOpenSpec doesn't actually have a fixed workflow design, users follow a default SDD best practice. So the first step is to lock this workflow down, making OpenCode write an OpenSpec proposal before writing any code.\n\n```\n---\nname: openspec-workflow\ndescription: Mandatory prerequisite for ALL OpenSpec operations — load this BEFORE any openspec-* skill. Use when running /opsx-apply, /opsx-propose, /opsx-verify, /opsx-archive, /opsx-explore, /opsx-sync; running `openspec status`, `openspec list`, `openspec instructions`; reading files under `openspec/changes/`; or doing any OpenSpec stage (propose, apply, verify, archive, explore, sync).\nlicense: MIT\ncompatibility: Requires openspec CLI.\nmetadata:\n    author: Peng Qian\n    version: \"1.0\"\n---\n\n## OpenSpec Workflow (Mandatory)\n\n**All code changes must have a proposal before any code gets written.**\n\n### Process\n1. **Explore** - When the user says \"think about it,\" \"discuss,\" or \"explore,\" discuss only — no coding.\n2. **Propose** - Create proposal files under `openspec/changes/<change-name>/`.\n3. **Apply** - Implement according to the proposal tasks. **No file modifications without a proposal.**\n4. **Verify** - Verify after implementation is complete.\n5. **Archive** - Archive the change.\n\n### Hard Rules\n\n- **Bug fixes don't require editing or creating a proposal.** This hard rule only applies to feature changes or new features.\n- **No proposal, no change:** If the user asks to modify code, confirm that a matching proposal exists first, or create one.\n- **No proposal, no edits:** Before editing a file, check that a matching change directory exists under `openspec/changes/`.\n- **No coding in Explore mode:** When the user is in explore mode, **do not** create proposals, **do not** edit files, **do not** write tests.\n- **After a change is complete:** Run the verify process to check that the implementation matches the proposal.\n\n### Proposal Creation Requirements\n\nEvery change must include:\n- `proposal.md` - reason and scope of the change\n- `design.md` - design plan\n- `tasks.md` - specific task list\n- `.openspec.yaml` - change metadata\n\nAdditional requirements:\n- **Task granularity:** Each task in `tasks.md` should take no more than 2 hours.\n\n### Violation Handling\n\nStop immediately and alert the user if any of the following are detected:\n- Code modification starts without a proposal.\n- Files are edited in explore mode.\n- Files outside the current proposal's scope are modified.\n```\n\nYou can put this workflow into your project's `AGENTS.md`\n\nfile so OpenCode follows it consistently. Or put it in the global `~/.config/opencode/AGENTS.md`\n\nfile so you don't have to configure it for every project.\n\nA better option is to turn the workflow into a skill, so OpenCode only loads this workflow definition when using OpenSpec. I've packaged the full workflow as the `openspec-workflow`\n\nskill, you can grab the source file at the end of the article.\n\n### Inserting the reflection agent into the workflow\n\nOnce the OpenSpec workflow is locked down as a skill, every SDD coding session will follow the spec-first, code-second process, which means the `/opsx-propose`\n\nand `/opsx-apply`\n\nphases.\n\nAs mentioned earlier, the reflection agent sits between these two phases, reviewing the quality of the proposal files before any implementation starts. Following the reflection pattern, this review-and-fix cycle can run for multiple rounds until the proposal artifacts have no serious issues.\n\nTo prevent an infinite loop, we need a hard cap on the number of review rounds. Let's update the `openspec-workflow`\n\n`SKILL.md`\n\nfile to add the reflection process:\n\n```\n### OpenSpec Reflection Process\n\n1. **After each batch of artifacts is created**, the `@openspec-reviewer` agent **must** be called to review the **artifact files in that batch**.\n2. The main agent fixes the **current batch of artifact files** based on the feedback from `@openspec-reviewer`.\n3. Call `@openspec-reviewer` again to review the **current batch of artifact files**.\n4. **Review pass criteria:**\n   4a. **Single-round pass:** After the current review round, if \"### 🔴 Remaining Issues\" does not exist or is empty, move to the next batch.\n   4b. **Fix loop:** If 🔴 issues remain → main agent fixes → next review round → back to 4a. Repeat until passing or 4c triggers.\n   4c. **Hard cap (MAX_ROUNDS = 5):** If the same batch has gone through 5 review rounds without passing 4a → stop the loop and hand off to a human for a decision.\n```\n\nAt this point, a typical reflection process for OpenSpec proposals is in place. All you need to do is run `/opsx-propose`\n\n, go grab a coffee, and wait for the reflection agent to gradually refine your proposal.\n\n## Optimizing the Reflection Process\n\nAfter going through a few rounds of \"review and fix,\" you'll notice this reflection process doesn't work as perfectly as you'd expect.\n\nIt's common to fix issue A only to have issue B pop up. Sometimes all 5 review rounds finish, and not every issue is fully resolved. Especially with powerful but not top-tier models like deepseek-v4. It's fine. We can optimize the process to fix this.\n\n### Saving the explore discussion to disk\n\nOpenSpec has a gap: the `/opsx-explore`\n\nphase involves very detailed requirements discussions, but none of that gets saved to a file. Once the user starts a new session or the session hits its context limit, all those discussion details are gone.\n\nThe review and fix process runs for many rounds and generates a huge amount of context tokens, which almost guarantees the `explore`\n\ndiscussion details get lost.\n\nOn top of that, OpenCode calls the reflection agent in a separate sub-session to review the proposal artifacts. OpenCode doesn't pass along the background context in detail during that call, so the explore discussion never makes it into the reflection agent's context.\n\nSo we need to update the `openspec-workflow`\n\nskill file to require OpenSpec to generate an `explore-brief.md`\n\ndocument after `/opsx-explore`\n\nfinishes. This document serves as a persistent checklist baseline for the proposal review process.\n\nWe also need to update the reflection agent's prompt to reference this `explore-brief.md`\n\nfile during the review.\n\nSometimes the requirements are simple and `/opsx-explore`\n\nwasn't used, so the primary agent won't generate an `explore-brief.md`\n\n. In that case, the reflection agent falls back to the context background passed along when the sub-agent is called.\n\n### Review artifacts one at a time, not all at once\n\nIn the early stages, I found the biggest reason proposals kept going through so many review rounds was file consistency issues.\n\nAt first, all proposal artifacts were written at once before any review started. The agent had to review four files in one shot: `proposal.md`\n\n, `design.md`\n\n, `spec.md`\n\n, and `tasks.md`\n\n. That meant one review round could surface 8 issues spread across 4 files.\n\nFix file A, and file B becomes inconsistent. Fix file B, and file C needs updating. The next review round then has to check all 4 files again, and the consistency verification cost explodes as N files × N files combinations.\n\nThis gets especially bad with open-source models like deepseek-v4 or kimi. Fixing A and forgetting B happens all the time. That's the root of the problem.\n\nThe best fix is to review artifact files in batches.\n\nOpenSpec writes `proposal.md`\n\n, then immediately enters the review phase. Once it passes, `proposal.md`\n\nis frozen and can't be changed. Then OpenSpec generates `design.md`\n\nand enters review. Once that passes, `design.md`\n\nis frozen, then `spec.md`\n\nis generated and reviewed, and so on.\n\nThis way, each subsequent artifact's review naturally includes a consistency check against the already-frozen earlier artifacts. Why does this guarantee consistency? Let me draw a diagram:\n\nThe consistency guarantee comes from two mechanisms:\n\n- The freeze mechanism. Artifacts that pass review can't be freely modified anymore. This eliminates the back-and-forth of fixing A, then fixing B, then having to fix A again. It cuts down on a lot of repeated review rounds.\n- One-way dependency. proposal → design → specs → tasks is a strict DAG. When reviewing design, you only need to check its consistency with the proposal, not with specs (since specs haven't been written yet). Each review round only compares one pair (new artifact vs. frozen artifact), not N × N.\n\nGenerating and reviewing artifacts in batches may not reduce the total number of rounds compared to generating everything at once, but each round only touches 1 or 2 files. Fixes don't accidentally break other files, and you don't have to worry about the LLM fixing A and forgetting B.\n\nThe final review process looks like this:\n\n### Log every review round\n\nSame problem again: OpenCode calls the reflection agent in a sub-session to review proposals. Not only does the sub-session lose its input context (already solved with `explore-brief.md`\n\n), it also loses the history of previous review conclusions.\n\nThe primary agent starts fixing the current round right after getting the reflection agent's feedback, without saving those results to the session context. As the review and fix context grows, the LLM's attention inevitably drifts.\n\nThe best fix is to have the primary agent log each review's issues and fix results.\n\nThat way, in later review rounds, the LLM can get a full picture of what happened historically, what decisions were made, and how they affect the current round. And if the review hits the max rounds and needs human intervention, we have a much better reference to work from.\n\nThe implementation is simple: after each round of fixes, append the current round's review results to a `review-log.md`\n\nfile.\n\n## Conclusion\n\nLet's recap everything we did to improve OpenSpec proposal artifact quality through the reflection mechanism:\n\n- We set up a reflection agent. Its job is to fully review the proposal artifacts that OpenSpec produces before the\n`/opsx-apply`\n\nphase starts, making sure everything is solid before moving forward. - We locked down the OpenSpec workflow as an\n`openspec-workflow`\n\nskill. This skill requires OpenCode to strictly follow the spec-first, code-second process. After the proposal is generated, the reflection agent is called to review it. - We patched a gap in OpenSpec by requiring it to record each\n`/opsx-explore`\n\nrequirements discussion in an`explore-brief.md`\n\nfile. This prevents the LLM's context window from losing important details during long review processes. Each review and fix result is also recorded incrementally in a`review-log.md`\n\nfile, for the same reason. - We changed how OpenSpec generates artifact files: from generating all files at once to generating one file, reviewing it, and freezing it before moving to the next. This eliminates the root cause of fixing A, breaking B, fixing B, then having to fix A again.\n\nWith these reflection harness optimizations in place, even using deepseek-v4-pro, code bugs are rare now. The model covers all kinds of edge cases thoroughly, code reviews almost always pass on the first try, and messy code is much less common.\n\nThat said, adding this reflection harness does make the OpenSpec proposal generation process much longer. It takes more waiting time and extra tokens. But getting DeepSeek-v4 to match or even beat Claude Opus in code generation quality makes it worth it overall.\n\nBeyond the reflection mechanism, there's still a lot of room to improve the OpenSpec coding workflow.\n\nI'm designing a self-evolving plugin system that lets agents learn from each review and fix cycle, so OpenCode can avoid making the same mistakes in future proposal generation. I look forward to walking you through that in the next article.\n\nThanks for reading and subscribing. If anything in the article wasn't clear, leave a comment and I'll get back to you.\n\nFeel free to share this article with your friends, it might help someone out.\n\nBelow is the complete source code for the reflection agent and skills:", "url": "https://wpnews.pro/news/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow", "canonical_source": "https://www.dataleadsfuture.com/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow/", "published_at": "2026-05-29 12:18:45+00:00", "updated_at": "2026-06-15 07:17:39.030984+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-tools", "ai-research", "developer-tools"], "entities": ["Harness Engineering", "OpenCode", "OpenSpec", "DeepSeek-V4-pro", "Claude Opus 4.6", "GPT 5.5", "Claude Code"], "alternates": {"html": "https://wpnews.pro/news/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow", "markdown": "https://wpnews.pro/news/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow.md", "text": "https://wpnews.pro/news/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow.txt", "jsonld": "https://wpnews.pro/news/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow.jsonld"}}