{"slug": "recursive-ai-research-skill-for-claude-code-openclaw-codex", "title": "Recursive AI Research Skill for Claude Code / OpenClaw / Codex", "summary": "A new open-source SKILL.md file enables coding agents like Claude Code, OpenClaw, and Codex to autonomously execute the full ML/AI research loop—from hypothesis to paper—while logging mistakes as lessons to prevent repetition. The recursive design helps researchers avoid common pitfalls such as leaked labels, single-seed wins, and hallucinated citations, with the skill improving over time via a LESSONS.md file.", "body_md": "**One SKILL.md that makes any coding agent move fast through the full ML/AI research loop — and get better every time it makes a mistake.**\n\nHypothesis → literature review → reproduce baseline → leak-free experiments → honest analysis → paper. Works with [Claude](https://claude.ai) / Claude Code, [OpenAI Codex](https://openai.com), and [OpenClaw](https://github.com/openclaw/openclaw). Recursive by design: it logs its own mistakes as rules so it never repeats them.\n\n⭐ **If this saves you from one leaked-label result, please star the repo — it helps other researchers find it.**\n\nMost AI-agent \"research\" help is a chatbot that sounds confident and cites papers that don't exist. Real research fails in specific, boring, expensive ways: a baseline you quoted instead of ran, a metric that's secretly leaking the label, a \"gain\" that's really one lucky seed, a citation invented from memory.\n\nThis skill encodes the discipline that catches those failures **before** they cost you a month — as a portable `SKILL.md`\n\nyour agent reads automatically. And it's **recursive**: when the agent makes a mistake and fixes it, it writes the lesson to `LESSONS.md`\n\n, reads that file at the start of every future task, and stops repeating itself. The skill you use in month three is sharper than the one you installed.\n\n| Stage | What the agent does | Guardrail it enforces |\n|---|---|---|\nFrame |\nTurns a vague idea into a testable hypothesis + a stated delta vs prior work | No experiment until the claim is one sentence |\nReview |\nFinds the 5–15 papers that matter, builds a comparison matrix | Cite only papers actually read — never from memory |\nReproduce |\nRuns the strongest baseline on your setup first |\nYou need a ruler before you measure a gain |\nDesign |\nSets seeds, fixes splits, runs a full leakage audit |\nSuspiciously-good ≠ breakthrough — prove it's not leakage |\nRun |\nScaffolds configs so every number is reproducible | The config is the single source of truth |\nAnalyze |\nCompares vs baseline with mean ± std over ≥3 seeds | A single-seed win is a story, not a finding |\nWrite |\nBacks every claim with a number; ships a reproducibility checklist | Never drop the seed/dataset that hurt the story |\n\nClone it, then drop it where your agent looks for skills:\n\n```\ngit clone https://github.com/<your-username>/ai-research-skill.git\n```\n\n**Claude / Claude Code / Claude Cowork**\n\nInstall the folder (or a packaged `.skill`\n\nbundle) into your skills directory. Claude keeps the `name`\n\n+ `description`\n\nin context always, and loads the full skill when your task looks like AI/ML research. Then just work normally — *\"help me reproduce this paper's baseline\"*, *\"why is my F1 suspiciously high?\"* — and it kicks in.\n\n**OpenClaw 🦞**\n\n```\ncp -r ai-research-skill ~/.openclaw/workspace/skills/ai-research\n```\n\nOpenClaw reads the same `SKILL.md`\n\nfrontmatter + body, and its injected `AGENTS.md`\n\npath lands in the same place. Inspect any skill before installing it — treat community skills like npm packages from strangers.\n\n**Codex & other AGENTS.md agents**\n\nKeep `AGENTS.md`\n\nat your repo root (it's a thin pointer to `SKILL.md`\n\n). Codex reads `AGENTS.md`\n\nand follows the skill from there.\n\n```\n   start task ─▶ read LESSONS.md ─▶ do research ─▶ made a mistake?\n        ▲                                              │ yes\n        │                                              ▼\n        └────────── LESSONS.md now has a new rule ◀── log_lesson.py\n```\n\nWhen the agent catches an error, it runs:\n\n```\npython scripts/log_lesson.py \\\n  --trigger \"reported a gain from one training run\" \\\n  --mistake \"claimed 'beats baseline' from a single seed\" \\\n  --fix     \"re-ran 3 seeds; gain was inside the noise band\" \\\n  --rule    \"no comparison claim without mean ± std over >=3 seeds\" \\\n  --tags    \"seeds,reproducibility\"\n```\n\nThe script **dedupes** similar rules, **counts** repeats, flags a rule for\n**promotion** into `SKILL.md`\n\nonce it's seen 3×, and tells you when the log needs\n**pruning**. A built-in guardrail refuses any \"lesson\" that would weaken research\nintegrity (fabricating, hiding, or cherry-picking results). The loop makes the\nskill smarter — it can't make it dishonest.\n\n```\nai-research-skill/\n├── SKILL.md                       # the skill (source of truth, Claude + OpenClaw)\n├── AGENTS.md                      # thin pointer for Codex / OpenClaw\n├── LESSONS.md                     # recursive memory — mistakes → rules\n├── scripts/\n│   ├── log_lesson.py              # append a deduped, counted lesson\n│   └── new_experiment.py          # scaffold a reproducible experiment dir\n└── references/\n    ├── literature-review.md       # find prior art fast; build the matrix\n    ├── experiment-design.md       # reproduce, seeds, the leakage audit\n    └── paper-writing.md           # claim→evidence + reproducibility checklist\n```\n\nPRs welcome — especially new `LESSONS.md`\n\nentries from real research mistakes\n(that's the whole point), new reference playbooks, and adapters for more agents.\nOpen an issue with the failure mode you hit and the rule that fixes it.\n\nAI research agent · machine learning research assistant · Claude skill · Claude Code skill · SKILL.md · Codex AGENTS.md · OpenClaw skill · self-improving agent · recursive AI agent · reproducible ML · data leakage detection · experiment tracking · ablation study · literature review automation · paper writing · LLM research workflow · NLP research · research reproducibility.\n\n[MIT](/Toadoum/ai-research-skill/blob/main/LICENSE) — use it, fork it, ship it.\n\n*Built for researchers who'd rather find the leak on day one than in review.*\n**⭐ Star it if it helps.**", "url": "https://wpnews.pro/news/recursive-ai-research-skill-for-claude-code-openclaw-codex", "canonical_source": "https://github.com/Toadoum/ai-research-skill", "published_at": "2026-07-01 14:40:29+00:00", "updated_at": "2026-07-01 14:50:55.095437+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "ai-research", "ai-agents", "developer-tools"], "entities": ["Claude", "OpenAI Codex", "OpenClaw", "Claude Code", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/recursive-ai-research-skill-for-claude-code-openclaw-codex", "markdown": "https://wpnews.pro/news/recursive-ai-research-skill-for-claude-code-openclaw-codex.md", "text": "https://wpnews.pro/news/recursive-ai-research-skill-for-claude-code-openclaw-codex.txt", "jsonld": "https://wpnews.pro/news/recursive-ai-research-skill-for-claude-code-openclaw-codex.jsonld"}}