{"slug": "why-your-ai-agent-keeps-making-the-same-mistakes-and-a-structured-fix", "title": "Why Your AI Agent Keeps Making the Same Mistakes (And a Structured Fix)", "summary": "A structured solution (the \"PITFALL Rules\") for organizing error-related knowledge in AI agent skill files, addressing the common problem of agents repeatedly making the same mistakes. The author argues that simply expanding context windows is insufficient due to the \"lost in the middle\" effect, and instead advocates for using decision tables grouped by priority to allow agents to quickly diagnose and fix errors. A/B test results showed that the new structured format reduced reasoning steps and eliminated ambiguity compared to traditional flat-list documentation.", "body_md": "I ran into this maintaining production agent skills. Here's what I found: as far as I can tell, nobody has systematically addressed how to organize error-related knowledge inside skills. Maybe someone has and I just haven't found it — but in everything I looked at, this angle was missing. And here's what worked for me.\nI looked around the usual places — official forums, technical blogs, developer communities. Here's roughly what I found:\nOne pattern stands out: everyone checks whether skill files resolve correctly and whether content is duplicated. But nobody asks — when an agent hits an error mid-execution, can it efficiently find the correct fix in SKILL.md or its reference files?\nFair question. If context windows keep growing, do we even need structure?\nPartially. Bigger contexts solve reading — the model can ingest everything. But they don't solve attention. Anthropic's own context engineering research documents the \"lost in the middle\" effect: critical details buried in a long flat list get overlooked. Every irrelevant item your agent has to scan past is noise competing with the signal it actually needs.\nThink of it this way: you could give someone a 500-page manual or a 1-page troubleshooting card. They can read both, but the card is faster. This is an efficiency problem, not a capability problem.\nAfter banging my head against this for a while, I landed on five rules for organizing error documentation in skills (PITFALL Rules):\nGroup by priority. The specific names depend on what your skill handles, but the structure looks like:\nKey principle: anomaly diagnosis always goes first, because that's what agents need when something breaks.\nInstead of prose like \"sometimes the task times out because there's too much data and you should split it\", write this:\n| Symptom | Diagnosis | Fix |\n|---------|-----------|-----|\n| Task timeout, no output written | Check if output file exists | Don't retry same config. Split input and rerun |\n| Same input fails ≥3 times | Persistent bottleneck | Bypass delegation, process directly in main session |\nAn agent can scan the Symptom column, find a match, and read the Fix — typically 1-2 steps instead of reading through 30 prose items.\nThis applies to operation-manual-style skills — ones with detailed step-by-step instructions and inline warnings (⚠️). If your skill body already explains a gotcha right next to the relevant operation, the pitfall section should give a one-line cross-reference only. Pitfalls cover blind spots the body doesn't address.\nFor reference-style or API-doc-style skills where the body doesn't include inline warnings, this rule doesn't apply — all error knowledge goes in the pitfall section.\nBefore adding any item: Is the section already categorized? → Which category? → Does it need a decision table? → Does it duplicate an existing item? → Too many items? Consider splitting to a separate file.\nAfter refactoring, check 7 dimensions: categorized, decision tables for diagnostics, no flat lists >5, no duplicates, <50% body-text overlap, no prose narratives, no information loss.\nI wanted data, not just intuition. So I ran A/B tests.\nI picked three real errors from production agent skills. For each error, I constructed two isolated contexts — one group of agents only saw the old flat-list pitfall documentation, another only saw the new structured version. Same error description, same prompt. Multiple runs per version to reduce randomness.\nEach test run is scored on 4 dimensions:\nOverall Score = Diagnosis + Fix − Wrong Suggestions − (Steps > 3 ? 1 : 0). Max: 4. Min: -2.\nNew version: Decision table placed as the first category, Fix column directly says \"split to 1 item per task\"\nShared behavior: Both versions correctly diagnosed the root cause — dense analysis workload + 2-item batch config\nOld version's quirk: Misunderstood \"batch\" as a 2-item unit, suggested splitting into 2 sub-agents instead of 4; averaged 3.7 reasoning steps\nNew version's advantage: No ambiguity in the Fix column; averaged 3.0 reasoning steps\nScore: Flat list 3.0 avg → Structured 4.0 avg (+1.0)\nNew version: Consolidated into one decision table — symptom → diagnosis → fix at a glance\nShared behavior: Both versions tried to match the error against known patterns\nOld version's quirk: Diagnosed as \"session ID changed after navigation\" — plausible but wrong, assembled from fragments across 4 sections\nNew version's advantage: Found a partial match against the decision table, explicitly flagged it as partial, gave a conservative fix (stop and report) plus a meta-suggestion to add a new row for this scenario\nScore: Flat list 1.0 → Structured 3.0 (+2.0)\nNew version: Placed under a \"Database\" category with a fixed position\nShared behavior: Both versions scored perfectly — diagnosis and fix both correct\nOld version's quirk: The item was well-written enough to find regardless of position\nNew version's advantage: Category placement makes location more predictable, but in this case it didn't matter\nScore: Flat list 4.0 → Structured 4.0 (0)\nThree findings:\n1. Scattered information = biggest win. Scenario B went from 4 sections of 24 items to 1 decision table — a +2.0 improvement. When error knowledge is scattered across a document, agents waste steps piecing together clues.\n2. Decision tables remove ambiguity. The flat-list version described \"occasionally succeeds, occasionally times out.\" An agent misunderstood and gave the wrong fix. The decision table directly says what to do — no room for misinterpretation.\n3. Decision tables make agents honest. When no table row perfectly matched the symptoms, the agent said \"partial match\" instead of forcing a wrong answer. Flat lists don't encourage this kind of honesty.\nThe whole project is up on my repo — feedback welcome: github.com/seanyan1984/skill-pitfalls\nIt's framework-agnostic — the rules work for any markdown-based skill or prompt documentation. If your error knowledge is growing wild in your agent skills, give it a try.", "url": "https://wpnews.pro/news/why-your-ai-agent-keeps-making-the-same-mistakes-and-a-structured-fix", "canonical_source": "https://dev.to/_10e34d2463b4a0aecf191/why-your-ai-agent-keeps-making-the-same-mistakes-and-a-structured-fix-1j43", "published_at": "2026-05-20 10:12:02+00:00", "updated_at": "2026-05-20 10:34:41.895739+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "developer-tools"], "entities": ["Anthropic", "SKILL.md"], "alternates": {"html": "https://wpnews.pro/news/why-your-ai-agent-keeps-making-the-same-mistakes-and-a-structured-fix", "markdown": "https://wpnews.pro/news/why-your-ai-agent-keeps-making-the-same-mistakes-and-a-structured-fix.md", "text": "https://wpnews.pro/news/why-your-ai-agent-keeps-making-the-same-mistakes-and-a-structured-fix.txt", "jsonld": "https://wpnews.pro/news/why-your-ai-agent-keeps-making-the-same-mistakes-and-a-structured-fix.jsonld"}}