# The Ten Levels of AI Skill Construction - From Prompt to Business Closure System

> Source: <https://dev.to/__b01666abd57fb7bb91f9/the-ten-levels-of-ai-skill-construction-from-prompt-to-business-closure-system-25e7>
> Published: 2026-06-28 14:36:12+00:00

I've spent the last year building AI agent skills for enterprise clients, and I kept running into the same problem: everyone treats "skills" as just fancy prompts. They write a Markdown file, call it a skill, and wonder why their agent can't handle real business workflows. So I mapped out ten distinct levels of skill construction — from a single prompt file to a full business closure system that orchestrates eight-plus skills end-to-end. Here's what I learned at each level.

This is where everyone starts. You write one Markdown file, stick it in a folder, and your agent reads it as instructions. That's it. No scripts, no references, no assets — just text telling the AI what to do.

```
# SKILL.md — Meeting Minutes Organizer

You are a meeting minutes organizer. When the user provides meeting content,
extract the following fields and output structured results:

## Output Format
- Meeting Time
- Meeting Location
- Participants
- Decisions
- Action Items

Always respond in the same language as the input.
```

I've found this works surprisingly well for simple, well-scoped tasks. A meeting minutes extractor? Perfect. A knowledge-base Q&A bot with a single document set? Also fine. But the moment your skill needs to handle edge cases, reference external data, or follow conditional logic, you hit a wall.

**When to use it:** Quick prototypes, demos, and tasks that fit in a single prompt window.

**When to outgrow it:** The first time you catch yourself writing "if the user asks X, do Y; if they ask Z, do W..." inside one Markdown file.

A Component Skill adds structure around the prompt. You still have your `SKILL.md`

, but now it references other files: knowledge documents, Python scripts, YAML configs, or static assets the agent can load at runtime.

```
my-skill/
├── SKILL.md
├── knowledge/
│   └── telecom-faq.md
├── scripts/
│   └── validate_input.py
└── assets/
    └── template.xlsx
```

The key insight: `SKILL.md`

now acts as a **coordinator**, pointing the agent to the right resources rather than stuffing everything into one file.

```
# SKILL.md — Knowledge RAG Assistant

## Knowledge Base
Load documents from the `knowledge/` directory before answering.

## Validation
Before processing user input, run `scripts/validate_input.py` to check format.

## Output Template
Use `assets/template.xlsx` as the output format for structured reports.
```

I built a telecom FAQ skill this way — the knowledge base was 200+ pages of installation manuals, and the validation script caught malformed addresses before they caused downstream errors. The prompt stayed clean; the heavy lifting lived in the supporting files.

**When to use it:** Any skill that needs domain knowledge, input validation, or structured output templates.

This is where things get interesting. A Workflow Skill defines a multi-step process with explicit decision points. Instead of "do this," you write "if X, go to step 3; if Y, go to step 5."

```
# SKILL.md — Complaint Handling Workflow

## Phase 1: Information Extraction
Extract: customer_id, complaint_type, urgency_level

## Phase 2: Routing Decision
- If urgency_level = "critical" → Phase 3A (Escalation)
- If urgency_level = "normal" AND complaint_type = "billing" → Phase 3B (Billing Track)
- Otherwise → Phase 3C (Standard Track)

## Phase 3A: Escalation
Notify supervisor, create priority ticket, SLA = 2 hours

## Phase 3B: Billing Track
Pull billing records, calculate discrepancy, auto-refund if < $50

## Phase 3C: Standard Track
Create standard ticket, SLA = 48 hours

## Phase 4: Closure
Confirm resolution with customer, update knowledge base
```

The breakthrough for me was realizing that workflow skills don't need code — they need **clarity**. When I wrote unambiguous branching logic in plain Markdown, the agent followed it correctly 90%+ of the time. The 10% failure rate came from vague wording, not missing code.

**When to use it:** Any multi-step process with conditional branching — complaint handling, onboarding flows, troubleshooting guides.

At Level 4, we stop pretending one agent can do everything. An Orchestration Skill uses a **Phase-Orchestrator** to launch independent sub-agents for each phase, passing structured JSON between them.

Here's what the flow looks like:

```
User Request
    ↓
Phase 1: Info-Extractor (sub-agent A)
    ↓ [JSON: extracted fields]
Phase 2: Data-Analyst (sub-agent B)
    ↓ [JSON: analysis results]
Phase 3: Report-Generator (sub-agent C)
    ↓ [Markdown report]
Final Output
```

The `SKILL.md`

defines the protocol:

```
# SKILL.md — Data Analysis Pipeline

## Orchestration Protocol
This skill uses Phase-Orchestrator for multi-Agent execution.
Each Phase runs as an independent sub-agent.

## Phase 1: Info-Extractor
- Input: raw user text
- Output: JSON with extracted_fields
- Pass to: Phase 2

## Phase 2: Data-Analyst
- Input: Phase 1 JSON output
- Output: JSON with analysis_results, insights, anomalies
- Pass to: Phase 3

## Phase 3: Report-Generator
- Input: Phase 2 JSON output
- Output: Markdown report
- Pass to: user

## Inter-Phase Data Contract
```

json

{

"extracted_fields": {...},

"analysis_results": {...},

"report_markdown": "..."

}

yaml

Why sub-agents instead of one agent doing all phases? **Isolation**. If Phase 2 crashes, Phase 1's extracted data isn't lost. Each sub-agent starts fresh with a clean context, so you don't hit token limits. And you can swap out Phase 2's skill without touching the others.

I tested this with a 4-phase financial report pipeline. Single-agent approach: 70% completion rate, frequent context loss. Orchestrated approach: 95% completion rate, clean handoffs every time.

**When to use it:** Any task that takes more than one distinct transformation — extract → analyze → report, query → aggregate → visualize, etc.

Security isn't a luxury; it's a layer. A Security Skill wraps protective checks around your other skills. It enforces the principle of least privilege, scans for dangerous actions, and blocks operations that exceed authorized scope.

```
# security-guard-config.yaml
skill_permissions:
  info-extractor:
    allowed_tools: [read, search]
    blocked_tools: [delete, write, execute]
    data_scope: "customer_profile_readonly"
    sensitive_fields:
      - field: id_card_number
        action: mask
      - field: phone_number
        action: mask_last_4

defense_rules:
  - pattern: "ignore.*previous.*instructions"
    action: block_and_log
  - pattern: "pretend.*you.*are"
    action: warn_and_confirm
  - pattern: "export.*all.*data"
    action: require_approval
```

The Security Skill sits in the orchestration pipeline as a gate:

```
Phase 1: Info-Extractor
    ↓
Security-Guard (check permissions, mask sensitive fields)
    ↓
Phase 2: Data-Analyst
    ↓
Security-Guard (validate output, check for data leaks)
    ↓
Phase 3: Report-Generator
```

I ran a red-team test against one of my skills. Without Security-Guard: 4 out of 6 attack vectors succeeded (prompt injection, data exfiltration, privilege escalation, unauthorized export). With Security-Guard: 0 out of 6 succeeded. The defense rules caught injection patterns, and the permission config blocked unauthorized tool access.

**When to use it:** Always. Seriously. Any skill that touches real data or performs real actions needs this layer.

A Scoring Skill evaluates business objects against configurable rules and weights. The rules live in YAML, not in the prompt — so when business logic changes, you update the config, not the skill.

```
# scoring-rules.yaml
object_type: "enterprise_customer"
dimensions:
  - name: "business_potential"
    weight: 0.35
    rules:
      - id: "annual_revenue"
        field: "revenue_million"
        operator: "range"
        ranges:
          - [0, 100, 1]
          - [100, 500, 2]
          - [500, 1000, 3]
          - [1000, 5000, 4]
          - [5000, null, 5]
      - id: "growth_rate"
        field: "yoy_growth_pct"
        operator: "range"
        ranges:
          - [null, -5, 1]
          - [-5, 5, 2]
          - [5, 15, 3]
          - [15, 30, 4]
          - [30, null, 5]

  - name: "churn_risk"
    weight: 0.30
    rules:
      - id: "contract_expiry"
        field: "months_to_expiry"
        operator: "range"
        ranges:
          - [null, 1, 5]
          - [1, 3, 4]
          - [3, 6, 3]
          - [6, 12, 2]
          - [12, null, 1]
      - id: "complaint_count"
        field: "complaints_last_6m"
        operator: "range"
        ranges:
          - [5, null, 5]
          - [3, 5, 4]
          - [1, 3, 3]
          - [0, 1, 1]

  - name: "tech_readiness"
    weight: 0.35
    rules:
      - id: "digital_maturity"
        field: "digital_score"
        operator: "direct"
        max: 5
```

The scoring engine follows an orchestrated pipeline:

```
Phase 1: Info-Extractor → pull customer data from input
Phase 2: Knowledge-RAG → match scoring rules from YAML
Phase 3: Data-Analyst → calculate weighted scores per dimension
Phase 4: Report-Generator → output scorecard with recommendations
```

I built this for a telecom client who needed to score enterprise customers for 5G private network sales opportunities. They changed the weighting three times in the first month — each time, I updated two numbers in the YAML and redeployed. Zero code changes.

**When to use it:** Lead scoring, risk assessment, supplier evaluation, partner grading — anything that needs multi-dimensional weighted evaluation.

A Verification Skill doesn't trust any single data source. It pulls evidence from multiple independent sources, cross-validates them, detects conflicts, and produces confidence-scored conclusions.

```
# SKILL.md — Evidence Chain Analyzer

## Evidence Sources
1. Customer complaint records (CRM)
2. System alert logs (monitoring)
3. SLA performance data (operations)
4. Technician work orders (field)

## Cross-Validation Rules
- If ≥2 sources agree → confidence = 0.85
- If all sources agree → confidence = 0.95
- If sources conflict → flag conflict, lower confidence to 0.50
- If only 1 source available → confidence = 0.40, flag for manual review

## Conflict Detection
- Timeline mismatch: event A reported before cause B
- Quantity mismatch: complaint says 3 outages, logs show 1
- Attribution mismatch: CRM blames network, alerts show power failure
```

The output includes a confidence matrix:

```
{
  "conclusion": "Root cause: power failure at site DC-042",
  "confidence": 0.88,
  "evidence": [
    {"source": "alert_logs", "supports": true, "detail": "UPS failure at 14:32"},
    {"source": "technician_order", "supports": true, "detail": "Power restoration at 16:15"},
    {"source": "crm_complaint", "supports": true, "detail": "Customer reported outage at 14:35"},
    {"source": "sla_data", "supports": false, "detail": "SLA recorded as network issue (misclassification)"}
  ],
  "conflicts": [
    {
      "type": "attribution_mismatch",
      "sources": ["alert_logs", "sla_data"],
      "resolution": "SLA misclassified; alert logs are authoritative"
    }
  ]
}
```

I used this for a complaint investigation where the customer claimed 5 outages, the monitoring system showed 2, and the technician's work orders confirmed 3. The evidence chain revealed that 2 of the customer's reported outages were actually a single event they perceived as separate — and 1 real outage wasn't captured by monitoring due to a probe failure. Without cross-validation, we'd have either dismissed the customer's complaint or over-escalated.

**When to use it:** Complaint investigation, incident root cause analysis, audit verification, any scenario where truth lies across multiple systems.

An Approval Skill adds a mandatory human checkpoint before high-risk operations execute. It auto-assesses risk level, generates an approval request, and waits for explicit confirmation.

```
# SKILL.md — Human-in-Loop Approval

## Risk Assessment Matrix
| Level | Criteria | Examples |
|-------|----------|----------|
| L1 | Read-only, no data exposure | Query internal database |
| L2 | Read-only, contains sensitive data | View customer PII |
| L3 | Write to internal systems | Update customer record |
| L4 | External communication | Send email, post to chat |
| L5 | Irreversible or bulk operations | Delete records, export all data |

## Approval Workflow
- L1-L2: Execute automatically, log action
- L3: Execute with confirmation prompt
- L4: Require approval with content preview
- L5: Require approval + supervisor notification + audit trail

## Approval Request Format
```

json

{

"risk_level": "L4",

"action": "send_email",

"recipient": "[client@company.com](mailto:client@company.com)",

"content_preview": "Dear Client, regarding your 5G deployment...",

"requires_approval_from": "supervisor",

"audit_trail_id": "AIL-20260628-0042"

}

```
The key design principle: **never auto-execute L4+ operations**. I learned this the hard way when an agent auto-sent a draft client email that contained internal pricing notes. The human-in-loop layer now catches every L4+ action before it leaves the system.
```

python

def execute_with_approval(action):

risk = assess_risk(action)

if risk in ["L1", "L2"]:

return execute(action)

elif risk == "L3":

if confirm_with_user(action):

return execute(action)

elif risk in ["L4", "L5"]:

approval = request_approval(action, require_supervisor=(risk=="L5"))

if approval.status == "approved":

log_audit(action, approval)

return execute(action)

else:

return {"status": "rejected", "reason": approval.reason}

```
**When to use it:** Any action that sends data externally, modifies records, or performs irreversible operations.

---

## Level 9: Composite Skill — 5+ Skills Orchestrated Pipeline

A Composite Skill chains five or more specialized skills into a coordinated pipeline. Each skill retains its independence, but the composite skill defines the overall flow and data contracts between them.

Here's a real example — a **Customer Operations Dashboard** that combines six skills:
```

plaintext

User: "Show me the complaint trend for enterprise customers in Q2"

```
↓
```

[L3-GW-01: Data Query Gateway] — routes the request

↓

[L3-NL-01: NL2Query] — converts natural language to SQL

↓

[Security-Guard] — checks query permissions

↓

[L3-DB-01: Data Executor] — executes the validated SQL

↓

[L3-AG-01: Data Aggregator] — calculates trends, YoY, rankings

↓

[L3-VZ-01: Visualization Renderer] — generates ECharts dashboard

↓

Output: Interactive HTML dashboard with complaint trends

```
The composite skill's `SKILL.md` defines the orchestration:
```

markdown

Phase 2 → Phase 3: SQL string + query_metadata JSON

Phase 3 → Phase 4: Validated SQL + permission_token

Phase 4 → Phase 5: Raw result set JSON

Phase 5 → Phase 6: Aggregated data JSON + chart_suggestions

```
The beauty of this approach: each component skill can be swapped, upgraded, or tested independently. When we switched the visualization engine from Chart.js to ECharts, we only touched Phase 6. The other five phases didn't change at all.

**When to use it:** Complex business workflows that need querying, processing, validating, and presenting — dashboards, report pipelines, analysis suites.

---

## Level 10: Closure Skill — 8+ Skills End-to-End Business Loop

This is the final form. A Closure Skill doesn't just process data — it closes a business loop from intent to execution to archival. It orchestrates eight or more skills in a complete cycle, with human checkpoints, security gates, and knowledge retention built in.

Here's the architecture I built for **enterprise customer operations** (codename: ArkClaw):
```

plaintext

Step 1: Intent Understanding

→ NL2Query + Info-Extractor

↓

Step 2: Multi-Source Query

→ Data Executor + Knowledge-RAG

↓

Step 3: Rule-Based Scoring

→ Scoring Engine (YAML-configured)

↓

Step 4: Evidence Verification

→ Evidence Chain (cross-source validation)

↓

Step 5: Root Cause Mapping

→ Root-Cause-Mapper (topology + 3-layer reasoning)

↓

Step 6: Human Approval

→ Human-in-Loop (risk-gated confirmation)

↓

Step 7: Execution & Archival

→ Archive-Manager (tag, sanitize, persist)

↓

Step 8: Visual Output

→ Visualization Renderer + Report Generator

```
The full `SKILL.md` is substantial, but here's the core protocol:
```

markdown

This skill implements a complete operations loop:

Understand → Investigate → Score → Verify → Diagnose → Approve → Execute → Visualize

```
I ran this end-to-end for a real client — analyzing Jiangling Motors Group's 5G private network opportunity. The system scored their business potential at 84/100 (high opportunity), identified a moderate churn risk of 55/100, and generated a 15-page DOCX report with a one-page executive summary, data tables comparing revenue and contract timelines, and prioritized action recommendations. Total time: about 4 minutes. A human analyst would take 4 hours for the same depth.

The critical difference between Level 9 and Level 10? **Level 10 closes the loop.** It doesn't just produce output — it archives the analysis for future reference, updates the knowledge base with new patterns, and creates an audit trail. The next time someone asks about the same customer, the system starts from accumulated knowledge, not from scratch.

---

## Putting It All Together: The Level Progression

Here's how I think about when to move up a level:

| Level | What You Get | What It Costs | Signal to Level Up |
|-------|-------------|---------------|-------------------|
| 1 | Zero setup | Zero flexibility | "I need if/then logic" |
| 2 | Knowledge + scripts | File management | "I need branching workflows" |
| 3 | Decision trees | More complex prompts | "I need separate agents per step" |
| 4 | Multi-agent isolation | Orchestration overhead | "I need permission controls" |
| 5 | Security gates | Config complexity | "I need configurable rules" |
| 6 | Configurable scoring | YAML maintenance | "I need cross-source verification" |
| 7 | Evidence confidence | More data sources | "I need human checkpoints" |
| 8 | Risk-gated approval | Latency from waits | "I need a full pipeline" |
| 9 | End-to-end pipeline | Coordination complexity | "I need business loop closure" |
| 10 | Complete closure | Maximum architecture | "This IS my business process" |

One thing I want to emphasize: **you don't always need Level 10.** A Level 3 workflow skill is the right tool for a troubleshooting guide. A Level 6 scoring skill is perfect for lead qualification. The levels aren't a maturity model — they're a design space. Pick the level that matches your problem's complexity.

---

## The Pattern That Connects All Ten Levels

Looking back across all ten levels, I see one recurring pattern: **separation of concerns through structured protocols.**

- Level 1: Concerns mixed in one file
- Level 2: Knowledge separated from instructions
- Level 3: Steps separated with decision points
- Level 4: Agents separated with JSON contracts
- Level 5: Security separated as a gate layer
- Level 6: Rules separated into YAML config
- Level 7: Evidence separated by source
- Level 8: Approval separated by risk level
- Level 9: Pipeline separated into composable stages
- Level 10: Business logic separated from technical execution

Every level up is an act of separating something that was previously coupled. And the mechanism for that separation is always the same: a **structured data contract** (usually JSON) that one component produces and the next component consumes.

---

## What's Next?

I'm currently teaching a course where students build skills starting at Level 1 and work their way up to creating their own Level 9 composite skills in a two-hour hands-on session. The biggest "aha" moment? When they realize that the jump from Level 3 to Level 4 — from a single agent with a decision tree to multiple agents with structured handoffs — is the inflection point. Everything before Level 4 is prompt engineering. Everything from Level 4 onward is system engineering.

What level are your current AI skills at? And more importantly — what level do they *need* to be at to solve your actual business problems? I'd love to hear where you are in this progression.
```