The Ten Levels of AI Skill Construction - From Prompt to Business Closure System

A developer mapped ten distinct levels of AI skill construction, from a single prompt file to a full business closure system, based on a year of building AI agent skills for enterprise clients. The developer found that many treat skills as fancy prompts, but real business workflows require structured components, workflows, and orchestration. The first three levels—Single Prompt, Component Skill, and Workflow Skill—illustrate the progression from simple text instructions to multi-step processes with conditional branching.

I've spent the last year building AI agent skills for enterprise clients, and I kept running into the same problem: everyone treats "skills" as just fancy prompts. They write a Markdown file, call it a skill, and wonder why their agent can't handle real business workflows. So I mapped out ten distinct levels of skill construction — from a single prompt file to a full business closure system that orchestrates eight-plus skills end-to-end. Here's what I learned at each level. This is where everyone starts. You write one Markdown file, stick it in a folder, and your agent reads it as instructions. That's it. No scripts, no references, no assets — just text telling the AI what to do. SKILL.md — Meeting Minutes Organizer You are a meeting minutes organizer. When the user provides meeting content, extract the following fields and output structured results: Output Format - Meeting Time - Meeting Location - Participants - Decisions - Action Items Always respond in the same language as the input. I've found this works surprisingly well for simple, well-scoped tasks. A meeting minutes extractor? Perfect. A knowledge-base Q&A bot with a single document set? Also fine. But the moment your skill needs to handle edge cases, reference external data, or follow conditional logic, you hit a wall. When to use it: Quick prototypes, demos, and tasks that fit in a single prompt window. When to outgrow it: The first time you catch yourself writing "if the user asks X, do Y; if they ask Z, do W..." inside one Markdown file. A Component Skill adds structure around the prompt. You still have your SKILL.md , but now it references other files: knowledge documents, Python scripts, YAML configs, or static assets the agent can load at runtime. my-skill/ ├── SKILL.md ├── knowledge/ │ └── telecom-faq.md ├── scripts/ │ └── validate input.py └── assets/ └── template.xlsx The key insight: SKILL.md now acts as a coordinator , pointing the agent to the right resources rather than stuffing everything into one file. SKILL.md — Knowledge RAG Assistant Knowledge Base Load documents from the knowledge/ directory before answering. Validation Before processing user input, run scripts/validate input.py to check format. Output Template Use assets/template.xlsx as the output format for structured reports. I built a telecom FAQ skill this way — the knowledge base was 200+ pages of installation manuals, and the validation script caught malformed addresses before they caused downstream errors. The prompt stayed clean; the heavy lifting lived in the supporting files. When to use it: Any skill that needs domain knowledge, input validation, or structured output templates. This is where things get interesting. A Workflow Skill defines a multi-step process with explicit decision points. Instead of "do this," you write "if X, go to step 3; if Y, go to step 5." SKILL.md — Complaint Handling Workflow Phase 1: Information Extraction Extract: customer id, complaint type, urgency level Phase 2: Routing Decision - If urgency level = "critical" → Phase 3A Escalation - If urgency level = "normal" AND complaint type = "billing" → Phase 3B Billing Track - Otherwise → Phase 3C Standard Track Phase 3A: Escalation Notify supervisor, create priority ticket, SLA = 2 hours Phase 3B: Billing Track Pull billing records, calculate discrepancy, auto-refund if < $50 Phase 3C: Standard Track Create standard ticket, SLA = 48 hours Phase 4: Closure Confirm resolution with customer, update knowledge base The breakthrough for me was realizing that workflow skills don't need code — they need clarity . When I wrote unambiguous branching logic in plain Markdown, the agent followed it correctly 90%+ of the time. The 10% failure rate came from vague wording, not missing code. When to use it: Any multi-step process with conditional branching — complaint handling, onboarding flows, troubleshooting guides. At Level 4, we stop pretending one agent can do everything. An Orchestration Skill uses a Phase-Orchestrator to launch independent sub-agents for each phase, passing structured JSON between them. Here's what the flow looks like: User Request ↓ Phase 1: Info-Extractor sub-agent A ↓ JSON: extracted fields Phase 2: Data-Analyst sub-agent B ↓ JSON: analysis results Phase 3: Report-Generator sub-agent C ↓ Markdown report Final Output The SKILL.md defines the protocol: SKILL.md — Data Analysis Pipeline Orchestration Protocol This skill uses Phase-Orchestrator for multi-Agent execution. Each Phase runs as an independent sub-agent. Phase 1: Info-Extractor - Input: raw user text - Output: JSON with extracted fields - Pass to: Phase 2 Phase 2: Data-Analyst - Input: Phase 1 JSON output - Output: JSON with analysis results, insights, anomalies - Pass to: Phase 3 Phase 3: Report-Generator - Input: Phase 2 JSON output - Output: Markdown report - Pass to: user Inter-Phase Data Contract json { "extracted fields": {...}, "analysis results": {...}, "report markdown": "..." } yaml Why sub-agents instead of one agent doing all phases? Isolation . If Phase 2 crashes, Phase 1's extracted data isn't lost. Each sub-agent starts fresh with a clean context, so you don't hit token limits. And you can swap out Phase 2's skill without touching the others. I tested this with a 4-phase financial report pipeline. Single-agent approach: 70% completion rate, frequent context loss. Orchestrated approach: 95% completion rate, clean handoffs every time. When to use it: Any task that takes more than one distinct transformation — extract → analyze → report, query → aggregate → visualize, etc. Security isn't a luxury; it's a layer. A Security Skill wraps protective checks around your other skills. It enforces the principle of least privilege, scans for dangerous actions, and blocks operations that exceed authorized scope. security-guard-config.yaml skill permissions: info-extractor: allowed tools: read, search blocked tools: delete, write, execute data scope: "customer profile readonly" sensitive fields: - field: id card number action: mask - field: phone number action: mask last 4 defense rules: - pattern: "ignore. previous. instructions" action: block and log - pattern: "pretend. you. are" action: warn and confirm - pattern: "export. all. data" action: require approval The Security Skill sits in the orchestration pipeline as a gate: Phase 1: Info-Extractor ↓ Security-Guard check permissions, mask sensitive fields ↓ Phase 2: Data-Analyst ↓ Security-Guard validate output, check for data leaks ↓ Phase 3: Report-Generator I ran a red-team test against one of my skills. Without Security-Guard: 4 out of 6 attack vectors succeeded prompt injection, data exfiltration, privilege escalation, unauthorized export . With Security-Guard: 0 out of 6 succeeded. The defense rules caught injection patterns, and the permission config blocked unauthorized tool access. When to use it: Always. Seriously. Any skill that touches real data or performs real actions needs this layer. A Scoring Skill evaluates business objects against configurable rules and weights. The rules live in YAML, not in the prompt — so when business logic changes, you update the config, not the skill. scoring-rules.yaml object type: "enterprise customer" dimensions: - name: "business potential" weight: 0.35 rules: - id: "annual revenue" field: "revenue million" operator: "range" ranges: - 0, 100, 1 - 100, 500, 2 - 500, 1000, 3 - 1000, 5000, 4 - 5000, null, 5 - id: "growth rate" field: "yoy growth pct" operator: "range" ranges: - null, -5, 1 - -5, 5, 2 - 5, 15, 3 - 15, 30, 4 - 30, null, 5 - name: "churn risk" weight: 0.30 rules: - id: "contract expiry" field: "months to expiry" operator: "range" ranges: - null, 1, 5 - 1, 3, 4 - 3, 6, 3 - 6, 12, 2 - 12, null, 1 - id: "complaint count" field: "complaints last 6m" operator: "range" ranges: - 5, null, 5 - 3, 5, 4 - 1, 3, 3 - 0, 1, 1 - name: "tech readiness" weight: 0.35 rules: - id: "digital maturity" field: "digital score" operator: "direct" max: 5 The scoring engine follows an orchestrated pipeline: Phase 1: Info-Extractor → pull customer data from input Phase 2: Knowledge-RAG → match scoring rules from YAML Phase 3: Data-Analyst → calculate weighted scores per dimension Phase 4: Report-Generator → output scorecard with recommendations I built this for a telecom client who needed to score enterprise customers for 5G private network sales opportunities. They changed the weighting three times in the first month — each time, I updated two numbers in the YAML and redeployed. Zero code changes. When to use it: Lead scoring, risk assessment, supplier evaluation, partner grading — anything that needs multi-dimensional weighted evaluation. A Verification Skill doesn't trust any single data source. It pulls evidence from multiple independent sources, cross-validates them, detects conflicts, and produces confidence-scored conclusions. SKILL.md — Evidence Chain Analyzer Evidence Sources 1. Customer complaint records CRM 2. System alert logs monitoring 3. SLA performance data operations 4. Technician work orders field Cross-Validation Rules - If ≥2 sources agree → confidence = 0.85 - If all sources agree → confidence = 0.95 - If sources conflict → flag conflict, lower confidence to 0.50 - If only 1 source available → confidence = 0.40, flag for manual review Conflict Detection - Timeline mismatch: event A reported before cause B - Quantity mismatch: complaint says 3 outages, logs show 1 - Attribution mismatch: CRM blames network, alerts show power failure The output includes a confidence matrix: { "conclusion": "Root cause: power failure at site DC-042", "confidence": 0.88, "evidence": {"source": "alert logs", "supports": true, "detail": "UPS failure at 14:32"}, {"source": "technician order", "supports": true, "detail": "Power restoration at 16:15"}, {"source": "crm complaint", "supports": true, "detail": "Customer reported outage at 14:35"}, {"source": "sla data", "supports": false, "detail": "SLA recorded as network issue misclassification "} , "conflicts": { "type": "attribution mismatch", "sources": "alert logs", "sla data" , "resolution": "SLA misclassified; alert logs are authoritative" } } I used this for a complaint investigation where the customer claimed 5 outages, the monitoring system showed 2, and the technician's work orders confirmed 3. The evidence chain revealed that 2 of the customer's reported outages were actually a single event they perceived as separate — and 1 real outage wasn't captured by monitoring due to a probe failure. Without cross-validation, we'd have either dismissed the customer's complaint or over-escalated. When to use it: Complaint investigation, incident root cause analysis, audit verification, any scenario where truth lies across multiple systems. An Approval Skill adds a mandatory human checkpoint before high-risk operations execute. It auto-assesses risk level, generates an approval request, and waits for explicit confirmation. SKILL.md — Human-in-Loop Approval Risk Assessment Matrix | Level | Criteria | Examples | |-------|----------|----------| | L1 | Read-only, no data exposure | Query internal database | | L2 | Read-only, contains sensitive data | View customer PII | | L3 | Write to internal systems | Update customer record | | L4 | External communication | Send email, post to chat | | L5 | Irreversible or bulk operations | Delete records, export all data | Approval Workflow - L1-L2: Execute automatically, log action - L3: Execute with confirmation prompt - L4: Require approval with content preview - L5: Require approval + supervisor notification + audit trail Approval Request Format json { "risk level": "L4", "action": "send email", "recipient": " client@company.com mailto:client@company.com ", "content preview": "Dear Client, regarding your 5G deployment...", "requires approval from": "supervisor", "audit trail id": "AIL-20260628-0042" } The key design principle: never auto-execute L4+ operations . I learned this the hard way when an agent auto-sent a draft client email that contained internal pricing notes. The human-in-loop layer now catches every L4+ action before it leaves the system. python def execute with approval action : risk = assess risk action if risk in "L1", "L2" : return execute action elif risk == "L3": if confirm with user action : return execute action elif risk in "L4", "L5" : approval = request approval action, require supervisor= risk=="L5" if approval.status == "approved": log audit action, approval return execute action else: return {"status": "rejected", "reason": approval.reason} When to use it: Any action that sends data externally, modifies records, or performs irreversible operations. --- Level 9: Composite Skill — 5+ Skills Orchestrated Pipeline A Composite Skill chains five or more specialized skills into a coordinated pipeline. Each skill retains its independence, but the composite skill defines the overall flow and data contracts between them. Here's a real example — a Customer Operations Dashboard that combines six skills: plaintext User: "Show me the complaint trend for enterprise customers in Q2" ↓ L3-GW-01: Data Query Gateway — routes the request ↓ L3-NL-01: NL2Query — converts natural language to SQL ↓ Security-Guard — checks query permissions ↓ L3-DB-01: Data Executor — executes the validated SQL ↓ L3-AG-01: Data Aggregator — calculates trends, YoY, rankings ↓ L3-VZ-01: Visualization Renderer — generates ECharts dashboard ↓ Output: Interactive HTML dashboard with complaint trends The composite skill's SKILL.md defines the orchestration: markdown Phase 2 → Phase 3: SQL string + query metadata JSON Phase 3 → Phase 4: Validated SQL + permission token Phase 4 → Phase 5: Raw result set JSON Phase 5 → Phase 6: Aggregated data JSON + chart suggestions The beauty of this approach: each component skill can be swapped, upgraded, or tested independently. When we switched the visualization engine from Chart.js to ECharts, we only touched Phase 6. The other five phases didn't change at all. When to use it: Complex business workflows that need querying, processing, validating, and presenting — dashboards, report pipelines, analysis suites. --- Level 10: Closure Skill — 8+ Skills End-to-End Business Loop This is the final form. A Closure Skill doesn't just process data — it closes a business loop from intent to execution to archival. It orchestrates eight or more skills in a complete cycle, with human checkpoints, security gates, and knowledge retention built in. Here's the architecture I built for enterprise customer operations codename: ArkClaw : plaintext Step 1: Intent Understanding → NL2Query + Info-Extractor ↓ Step 2: Multi-Source Query → Data Executor + Knowledge-RAG ↓ Step 3: Rule-Based Scoring → Scoring Engine YAML-configured ↓ Step 4: Evidence Verification → Evidence Chain cross-source validation ↓ Step 5: Root Cause Mapping → Root-Cause-Mapper topology + 3-layer reasoning ↓ Step 6: Human Approval → Human-in-Loop risk-gated confirmation ↓ Step 7: Execution & Archival → Archive-Manager tag, sanitize, persist ↓ Step 8: Visual Output → Visualization Renderer + Report Generator The full SKILL.md is substantial, but here's the core protocol: markdown This skill implements a complete operations loop: Understand → Investigate → Score → Verify → Diagnose → Approve → Execute → Visualize I ran this end-to-end for a real client — analyzing Jiangling Motors Group's 5G private network opportunity. The system scored their business potential at 84/100 high opportunity , identified a moderate churn risk of 55/100, and generated a 15-page DOCX report with a one-page executive summary, data tables comparing revenue and contract timelines, and prioritized action recommendations. Total time: about 4 minutes. A human analyst would take 4 hours for the same depth. The critical difference between Level 9 and Level 10? Level 10 closes the loop. It doesn't just produce output — it archives the analysis for future reference, updates the knowledge base with new patterns, and creates an audit trail. The next time someone asks about the same customer, the system starts from accumulated knowledge, not from scratch. --- Putting It All Together: The Level Progression Here's how I think about when to move up a level: | Level | What You Get | What It Costs | Signal to Level Up | |-------|-------------|---------------|-------------------| | 1 | Zero setup | Zero flexibility | "I need if/then logic" | | 2 | Knowledge + scripts | File management | "I need branching workflows" | | 3 | Decision trees | More complex prompts | "I need separate agents per step" | | 4 | Multi-agent isolation | Orchestration overhead | "I need permission controls" | | 5 | Security gates | Config complexity | "I need configurable rules" | | 6 | Configurable scoring | YAML maintenance | "I need cross-source verification" | | 7 | Evidence confidence | More data sources | "I need human checkpoints" | | 8 | Risk-gated approval | Latency from waits | "I need a full pipeline" | | 9 | End-to-end pipeline | Coordination complexity | "I need business loop closure" | | 10 | Complete closure | Maximum architecture | "This IS my business process" | One thing I want to emphasize: you don't always need Level 10. A Level 3 workflow skill is the right tool for a troubleshooting guide. A Level 6 scoring skill is perfect for lead qualification. The levels aren't a maturity model — they're a design space. Pick the level that matches your problem's complexity. --- The Pattern That Connects All Ten Levels Looking back across all ten levels, I see one recurring pattern: separation of concerns through structured protocols. - Level 1: Concerns mixed in one file - Level 2: Knowledge separated from instructions - Level 3: Steps separated with decision points - Level 4: Agents separated with JSON contracts - Level 5: Security separated as a gate layer - Level 6: Rules separated into YAML config - Level 7: Evidence separated by source - Level 8: Approval separated by risk level - Level 9: Pipeline separated into composable stages - Level 10: Business logic separated from technical execution Every level up is an act of separating something that was previously coupled. And the mechanism for that separation is always the same: a structured data contract usually JSON that one component produces and the next component consumes. --- What's Next? I'm currently teaching a course where students build skills starting at Level 1 and work their way up to creating their own Level 9 composite skills in a two-hour hands-on session. The biggest "aha" moment? When they realize that the jump from Level 3 to Level 4 — from a single agent with a decision tree to multiple agents with structured handoffs — is the inflection point. Everything before Level 4 is prompt engineering. Everything from Level 4 onward is system engineering. What level are your current AI skills at? And more importantly — what level do they need to be at to solve your actual business problems? I'd love to hear where you are in this progression.