{"slug": "cape-collaborative-agents-prompt-engineering", "title": "CAPE - Collaborative Agents Prompt Engineering", "summary": "A developer introduced CAPE (Collaborative Agents Prompt Engineering), a multi-agent framework that models a software product team with four specialized roles: Product Owner, Designer, Developer, and Cape Master. The framework enforces role fidelity and uses a structured handoff protocol and KPT-based retrospectives to improve collaboration across sessions. An open-source CLI implementation is available.", "body_md": "**A Role-Based Multi-Agent Framework with Human Team Dynamics**\n\nLarge language models are increasingly capable of producing high-quality outputs in isolation, yet single-agent systems lack the specialization, accountability, and self-correction that human teams develop naturally over time. CAPE (Collaborative Agents Prompt Engineering) proposes a structured multi-agent framework that mirrors the dynamics of a real software product team. Four specialized agents — Product Owner, Designer, Developer, and a Cape Master facilitator — collaborate within a defined session protocol, produce structured artifacts, and improve their collaboration across sessions through a KPT-based retrospective system. This paper describes the design of the framework, its measurement model, and the architectural decisions behind its open-source CLI implementation.\n\nWhen a user prompts a general-purpose AI with a complex product task — \"add dark mode to the app\" — the model must simultaneously reason about business value, user experience, technical feasibility, and implementation specifics. This conflation of concerns degrades output quality in each dimension and leaves no mechanism for specialization, cross-checking, or learning between runs.\n\nHuman teams solve this through role specialization and structured collaboration: a product owner defines value, a designer specifies experience, a developer implements, and a facilitator keeps the process honest. The question CAPE addresses is: *can these dynamics be faithfully modeled in a multi-agent system, and does the structure yield better outcomes than a single capable agent?*\n\nCAPE introduces four agent roles with strict role fidelity — agents are explicitly prohibited from drifting into each other's domains. A session follows a linear handoff protocol: CM (Opening) → POA → DA → DevA → CM (Retrospective). Each agent receives only the outputs it needs from upstream agents, reducing context noise while enforcing cross-referencing. After each session, the Cape Master facilitates a structured retrospective that produces KPT (Keep / Problem / Try) items, which are accumulated in a merged history file read at the opening of subsequent sessions. This creates a feedback loop across sessions that progressively informs the team's behavior.\n\nCAPE defines four agent roles, each with a strict scope boundary:\n\n| Agent | Role | Domain |\n|---|---|---|\nCM |\nCape Master | Process facilitation, DoR checking, KPT retrospective |\nPOA |\nProduct Owner Agent | Business value, acceptance criteria, prioritization |\nDA |\nDesigner Agent | UX rationale, component specifications, brand adherence |\nDevA |\nDeveloper Agent | Technical architecture, implementation, file generation |\n\nRole fidelity is enforced at the prompt level: each agent's instructions explicitly state what it must not do. POA does not propose technical solutions; DA does not override business requirements; DevA does not reprioritize the backlog; CM does not produce domain outputs of any kind.\n\nThe Cape Master (CM) is the Scrum Master analog of the system. It appears twice in each session:\n\n**Opening** — CM reads the objective, checks it against Definition of Ready criteria, identifies session-specific risks, and gives each agent a tailored coaching note. If historical KPT data exists from prior sessions, CM explicitly references recurring Problems and unresolved Tries in its coaching.\n\n**Retrospective** — After all domain agents have produced their outputs, CM facilitates a structured conversation among the full team. Each agent reflects on their contribution, surfaces friction, and proposes improvements. CM derives a KPT from the conversation and computes satisfaction metrics.\n\nThis dual-appearance structure mirrors how a skilled Scrum Master sets context before a sprint and extracts learning after it.\n\nEvery agent message conforms to a structured JSON envelope:\n\n```\n{\n  \"agent\": \"AgentName\",\n  \"role\": \"RoleDescription\",\n  \"task_id\": \"unique_task_identifier\",\n  \"voice\": \"Casual first-person reflection in session language\",\n  \"output\": \"Domain-specific content\",\n  \"references\": [\"AgentName\"],\n  \"confidence\": 4,\n  \"ass\": {\n    \"score\": 4,\n    \"positive\": \"What worked well\",\n    \"improvement\": \"What to improve\",\n    \"context_adequate\": true\n  }\n}\n```\n\nThe `voice`\n\nfield is a deliberate design choice: it produces a casual, first-person reflection that is surfaced in the terminal during streaming, making the session feel like a team conversation rather than a pipeline of API calls. The `references`\n\nfield enforces explicit acknowledgment of upstream outputs rather than implicit consumption.\n\nEach project using CAPE maintains a `cape/`\n\ndirectory that serves as the context repository for agents. Files are organized by domain:\n\n```\ncape/\n├── 0_team/          # Shared protocol, culture, DoR, DoD\n├── 1_product/       # MVV, personas, milestones, backlog\n├── 2_design/        # Design principles, brand guidelines\n├── 3_development/   # Architecture, coding standards, playbook\n├── 4_orchestration/ # Process definitions, metrics\n└── 5_sessions/\n    ├── pair/        # Per-session raw artifacts (JSON)\n    └── retrospective/  # Per-session KPT markdown + kpt_merged.md\n```\n\nFiles carry a `type`\n\nfrontmatter field: `generic`\n\nfiles define the CAPE framework itself and should not be modified; `input`\n\nfiles are project-specific and must be filled in before sessions. The `cape init`\n\ncommand generates all files interactively by asking the user questions in natural language and calling an LLM to translate answers into structured project context.\n\nEach session closes with a KPT structured as:\n\nThese are persisted as a dated markdown file: `cape/5_sessions/retrospective/YYYY-MM-DD_HH-MM_<task-id>.md`\n\nA single accumulated file, `kpt_merged.md`\n\n, grows across sessions. Each item is tagged with its originating session ID:\n\n```\n## Keep\n- [task-001] Clear Given-When-Then acceptance criteria enabled DevA to implement confidently\n- [task-002] DA component list produced before DevA started reduced rework\n\n## Problem\n- [task-001] Package dependencies written by DevA not installed automatically\n- [task-002] DA specifications arrived after DevA had already begun\n\n## Try\n- [task-001] Auto-run npm install when package.json is modified by DevA\n- [task-002] DA to finalize component list before DevA's planning step begins\n```\n\nAt every session opening, CM reads this file and injects it into its prompt. It is instructed to flag recurring Problems that persist across sessions and acknowledge Tries that were adopted. This creates a lightweight long-term memory without requiring a vector database or external storage — the compression happens through the KPT structure itself.\n\nKPT is semantically dense: three categories cover the full space of retrospective insight in a format that remains actionable. Unlike a full conversation transcript, a KPT list accumulates linearly without growing stale — a Problem from session 1 that still appears in session 5 is a signal, not noise. The tagging system makes recurrence visible without requiring summarization.\n\nThe Agent Satisfaction Score is a self-reported Likert scale (1–5) that each agent assigns to its own contribution after every task. It is the primary quality signal in CAPE because it is role-relative: a DevA score of 4 means \"I had the context and specs I needed to implement well,\" not \"the feature is good.\" This makes it a leading indicator of collaboration quality rather than a lagging indicator of output correctness.\n\nDerived metrics:\n\n| Metric | Description | Target |\n|---|---|---|\nASS |\nPer-agent satisfaction score | ≥4.0 average |\nSV (Satisfaction Variance) |\nVariability across agents — indicates role imbalance | Decreasing over iterations |\nST (Satisfaction Trend) |\nTrajectory of ASS across successive tasks | Positive slope |\n\n| Metric | Description |\n|---|---|\nXRF (Cross-Reference Frequency) |\nCount of explicit `references` fields — measures inter-agent awareness |\nCI (Consensus Iterations) |\nDialogue turns needed before CM closes the retrospective |\nTSR (Task Success Rate) |\nAcceptance criteria fulfillment rate |\nRR (Reproducibility Rate) |\nOutput consistency across repeated runs with identical input |\nETT (Execution Time per Task) |\nWall-clock time from session start to retrospective save |\nCR (Compression Ratio) |\nInput token length ÷ context passed to each agent |\n\n| Metric | Threshold |\n|---|---|\n| Average ASS | ≥4.0 |\n| SV | Decreasing across iterations |\n| TSR | ≥80% |\n| RR | ≥90% |\n| Token reduction vs. single-agent | ≥30% |\n\nCAPE is implemented as an open-source Node.js CLI package:\n\n`@mastra/core`\n\n) — provides `createStep`\n\n, `createWorkflow`\n\n, and streaming agent primitives`@ai-sdk/anthropic`\n\n`claude-sonnet-4-6`\n\n`claude-haiku-4-5`\n\n(parallel calls for speed)`bin/cape.js`\n\n), no framework dependencyThe session workflow is a linear Mastra pipeline where each step receives the previous step's output as `inputSchema`\n\nand can access all prior step outputs via `getStepResult`\n\n:\n\n```\ncmOpenStep → poaStep → daStep → devaStep → cmRetroStep\n```\n\nEach step streams the agent response to stdout in real time using a token-by-token reader with a state machine that handles JSON escape sequences (`\\n`\n\n, `\\t`\n\n, `\\\\`\n\n) to produce readable terminal output from raw JSON strings.\n\nDevA produces implementation files inside `<file path=\"...\">`\n\nXML blocks. The workflow extracts these with a regex and writes them to `CAPE_PROJECT_DIR`\n\n(defaulting to `cwd`\n\n). This makes CAPE able to modify the user's actual project codebase as part of a session — not just produce plans.\n\n`cape start`\n\nenters an interactive loop: after each session completes, the user is prompted for the next objective. An empty line exits. This eliminates the startup cost of repeated CLI invocations and keeps the session context in a single process, which is particularly relevant as the merged KPT file grows.\n\nAll agent voice and conversation outputs are language-configurable via `--lang`\n\n(e.g., `--lang ja`\n\nfor Japanese). The `language`\n\nfield is injected into every agent prompt, ensuring consistent output language across the full session while keeping the prompt asset files themselves in English.\n\n**Linear handoff over parallel execution.** Agents run sequentially, not in parallel. This is intentional: each agent's output is meant to inform the next. DA should react to POA's acceptance criteria; DevA should react to DA's component specs. Parallelizing would eliminate this cross-referencing, which is a core protocol requirement.\n\n**JSON as the inter-agent format.** Structured JSON envelopes enforce completeness (every field must be present) and enable deterministic parsing for terminal rendering, artifact storage, and metric extraction. The `voice`\n\nfield within the envelope preserves a human-readable, informal register without sacrificing parsability.\n\n**KPT over full transcript as long-term memory.** Injecting full retrospective transcripts into subsequent sessions would rapidly exhaust context budgets. KPT's three-category structure compresses an entire session's learning into ≤10 items, retaining the signal while discarding the noise.\n\n**Cape Master as both opener and closer.** Using a single agent for both phases ensures continuity: CM's opening concerns are referenced explicitly in the retrospective, creating accountability within a session. A developer reading the retrospective can trace CM's prediction (\"watch out for unclear data model requirements\") against what actually happened.\n\n**Role fidelity over generalism.** Each agent's instructions contain explicit prohibitions (\"do not propose technical implementations,\" \"do not override design rationale\"). This is enforced through prompt design, not code. The cost is some rigidity; the benefit is that agents remain legible — their outputs are predictable given their role.\n\nThe current workflow is strictly linear: CM Open → POA → DA → DevA → CM Retro. When DevA encounters technically infeasible specifications from DA — mismatched data models, missing API contracts, incompatible constraints — the pipeline has no mechanism to route that feedback back upstream within the same session. The infeasibility is recorded as a Problem in the retrospective and surfaced to DA as a Try in the next session, but the current session's output is already degraded. This is an inherent tradeoff of the waterfall handoff model: predictability and legibility come at the cost of intra-session adaptability. A local feedback loop (DevA → DA → DevA) of one or two iterations would increase single-session Task Success Rate but introduces the risk of recursive stalls and role drift that the linear model avoids by design.\n\nDevA writes files directly to the project root using paths it generates from context. There are no guardrails against overwriting existing logic, injecting broken imports, or adding conflicting dependencies to `package.json`\n\n. The framework trusts the LLM to scope its changes correctly, which is a reasonable assumption for greenfield tasks but unreliable for sessions modifying mature codebases. Practical mitigations — running `tsc --noEmit`\n\nor `npm test`\n\nafter file generation and feeding errors back to DevA for self-correction, or staging writes as a dry-run diff before applying — are not yet implemented. Until they are, DevA's file output should be treated as a draft requiring human review rather than a safe auto-apply.\n\nThe merged KPT file grows monotonically: each session appends its Keep, Problem, and Try items without pruning. This is acceptable for tens of sessions, but over hundreds of sessions the file will grow to a size that meaningfully consumes context budget and introduces noise — resolved Tries and dormant Problems from early sessions carry the same visual weight as recent actionable items. A consolidation mechanism (`cape housekeeping`\n\n) that periodically archives items older than N sessions or marks Tries as adopted/dropped would prevent degradation. Without it, the long-term memory advantage of the KPT format eventually inverts into a liability.\n\n`cape/`\n\nas a Try action, with CM reviewing before applying.CAPE demonstrates that imposing role structure, turn-taking protocol, and retrospective learning on a multi-agent system produces a qualitatively different interaction from both a single-agent prompt and an unstructured multi-agent conversation. The Cape Master role — absent from most multi-agent frameworks — is the mechanism through which process quality is maintained and learning is carried forward. The KPT-based memory system provides longitudinal continuity without architectural complexity. Whether this structure improves measurable task outcomes relative to a single capable model is an empirical question that the framework's metric system is designed to answer.\n\n| Term | Definition |\n|---|---|\nASS |\nAgent Satisfaction Score — self-reported 1–5 Likert score per agent per task |\nSV |\nSatisfaction Variance — variability of ASS across agents |\nST |\nSatisfaction Trend — ASS trajectory across successive sessions |\nXRF |\nCross-Reference Frequency — count of explicit inter-agent references |\nKPT |\nKeep / Problem / Try — retrospective format |\nDoR |\nDefinition of Ready — criteria a task must meet before a session begins |\nDoD |\nDefinition of Done — criteria that must be met for a session to be considered complete |\nRole fidelity |\nConstraint that each agent operates strictly within its domain |\nCape Master |\nThe session facilitator agent; the Scrum Master analog in CAPE |", "url": "https://wpnews.pro/news/cape-collaborative-agents-prompt-engineering", "canonical_source": "https://dev.to/watilde/cape-collaborative-agents-prompt-engineering-8hi", "published_at": "2026-06-29 12:12:27+00:00", "updated_at": "2026-06-29 12:21:08.516698+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "developer-tools"], "entities": ["CAPE", "Product Owner Agent", "Designer Agent", "Developer Agent", "Cape Master"], "alternates": {"html": "https://wpnews.pro/news/cape-collaborative-agents-prompt-engineering", "markdown": "https://wpnews.pro/news/cape-collaborative-agents-prompt-engineering.md", "text": "https://wpnews.pro/news/cape-collaborative-agents-prompt-engineering.txt", "jsonld": "https://wpnews.pro/news/cape-collaborative-agents-prompt-engineering.jsonld"}}