Applying a Systems Engineering Framework to Agentic Coding: Why Prompts Fail and Structure Wins

wpnews.pro

Agentic AI coding tools are transforming how we build software. But they share a fundamental constraint: context windows are finite, and as chat sessions grow, AI performance degrades, a phenomenon Anthropic calls context rot. The model loses its grip on early instructions, leading to a frustrating "fix-it loop" where the agent fixes one thing but breaks another.

Most of us prompt an agent, let it write code, review it, and repeat. This works beautifully for prototypes. But when you need to build a stable, full-featured product with hundreds of mission-critical acceptance criteria (AC), "vibe-coding" breaks down.

The reality is that you get better behavior from agents the same way you get it from humans, by explicitly capturing what good and bad look like, and checking against it.

Coming from a systems engineering background in regulated industries, I knew we needed to stop treating agents like conversational chat buddies and start treating them like engineering assets. That's why I built DevCortex: a purpose-built structured intelligence layer that brings systems engineering discipline to agentic workflows.

DevCortex is an agentic development platform built on one core idea: AI agents work best when they have structured, queryable access to a database of requirements they can interrogate on demand, not a wall of text in a prompt.

It sits between the human specification and AI execution using three components:

An Agentic-V Model Database: A structured hierarchy mapping your high-level vision (ConOps) to system specs (Specs), individual requirements (Reqs), linked defects (Issues), and an auto-generated Traceability Matrix.
An MCP Server: Delivers just-in-time, high-signal context to tools like Claude Code or Open Code. Instead of dumping requirements upfront, the agent queries exactly what it needs, when it needs it.
Human Control Planes (Web UI & CLI): A multi-user Web UI with real-time WebSocket feeds to watch your agent work, plus a powerful dcx CLI for power users and CI pipelines.

In a recent test I compared using DevCortex and AWS Kiro to build a Python CLI unit converter. The project had 8 requirements and 31 acceptance criteria.

Step 1: Import the Spec

For this test I used dcx CLI tool to import the Kiro requirements.md file directly into DevCortex, (Alternatively, I could have loaded the Spec and Reqs via the DevCortex WebUI or get the integrated AI Assistant to create them for me): dcx init

dcx import kiro ./requirements.md

(Result: 1 Spec, 8 reqs, 31 ACs populated and ready for the agent).

Step 2: Feed the Agent the Workflow

I then gave Claude Code a simple workflow via our Model Context Protocol (MCP) server:

Call dc_get_backlog

to retrieve requirements.

For each requirement, fetch full AC details using dc_get_requirement

.

Implement the code, write the tests, and verify.
The "Good vs. Bad" Rule: Only call dc_update_verification

to mark an AC

as PASSED

if a test explicitly proves it. Step 3: Watch it Work

Once executed, Claude Code didn't just blind-code. It systematically checked project health, pulled the requirement backlog, built an architectural map, and rigorously tracked its own verification.

Here is what the terminal trace looked like as it interacted with the DevCortex MCP server:

● Running dc_health and fetching the full backlog...

↳ Called devcortex MCP server

● 8 requirements found. Fetching full AC details in parallel...

↳ Called devcortex 8 times

● All 8 requirements read. Generated implementation plan:

- kiropy/units.py -> REQ-002 (categories & factors)
- kiropy/converter.py -> REQ-001, REQ-003, REQ-007 (logic & accuracy)
- kiropy/printer.py -> REQ-004 (formatting & precision)
- kiropy/cli.py -> REQ-003, REQ-005, REQ-006, REQ-008 (arguments & exit codes) ● Key interpretation decisions recorded:
- "6 significant decimal places" (REQ-004 AC-2) interpreted as 6 digits after decimal point.
- Absolute zero check (REQ-003 AC-5) to block invalid physical conversions (e.g., -300°C).

Round-trip accuracy (REQ-007 AC-2) via base-unit intermediary with zero intermediate rounding. ● Implementing project structure, test suites, and source files... ● All files written. Running test suites: 48/48 tests pass. ● Smoke tests pass. Recording verification results back to DevCortex... ↳ Called devcortex 32 times (Updating 29 distinct Acceptance Criteria to PASS) ↳ Called devcortex 8 times (Marking all 8 REQs as VERIFIED) ● Committing and closing out session. ✔ Project structure configured ✔ Core logic implemented & verified against trace matrix.

The application was built flawlessly with all 31 acceptance criteria verified and marked PASSED

.

More importantly, the traceability matrix completely bridges the trust gap. Every single AC links directly to a named test and explicit evidence. If you review this codebase in six months, you will know exactly why and how every requirement was fulfilled.

Structured requirements reduce drift: When an agent is bound to a structured backlog contract, it radically reduces hallucinating features or skipping "trivial" requirements.

Evidence-based verification reduces errors: Requiring the agent to provide test proof caught instances where the AI's initial code passed a shallow test but missed the spirit of the AC. The agent caught its own gaps and fixed them before claiming completion.

Effective use of agent context increases determinism: Given LLMs are constrained by a finite attention budget, enabling the coding agent to fetch the specific requirements and ACs as they need them helps reduce them maintain focus on the job at hand.

DevCortex is now available at devcortexai.com with a free tier. You can also install the CLI right now via npm:

npm install -g @devcortex/cli

Check out the getting started guide to connect Claude Code, or OpenCode via MCP, run your first verified build, or read our Case Study about building a full stack Career Journal App with DevCortex.

If you're working on agentic systems engineering or requirement-driven development, I'd love to compare notes in the comments below!

source & further reading

dev.to — original article The Leaderboard Is Dead. Here's What I Actually Reach For. Audit a Cursor or v0-Built MVP Before You Launch Vercel + Lovable, GPT-5.6 multiagent, curl security patch — Dev Signal #64

Applying a Systems Engineering Framework to Agentic Coding: Why Prompts Fail and Structure Wins

Run your AI side-project on zahid.host