Recursive AI Research Skill for Claude Code / OpenClaw / Codex

wpnews.pro

cd /news/artificial-intelligence/recursive-ai-research-skill-for-clau… · home › topics › artificial-intelligence › article

[ARTICLE · art-46812] src=github.com ↗ pub=2026-07-01T14:40Z topic=artificial-intelligence verified=true sentiment=↑ positive

Recursive AI Research Skill for Claude Code / OpenClaw / Codex

A new open-source SKILL.md file enables coding agents like Claude Code, OpenClaw, and Codex to autonomously execute the full ML/AI research loop—from hypothesis to paper—while logging mistakes as lessons to prevent repetition. The recursive design helps researchers avoid common pitfalls such as leaked labels, single-seed wins, and hallucinated citations, with the skill improving over time via a LESSONS.md file.

read4 min views1 publishedJul 1, 2026

Recursive AI Research Skill for Claude Code / OpenClaw / Codex — Image: source

One SKILL.md that makes any coding agent move fast through the full ML/AI research loop — and get better every time it makes a mistake.

Hypothesis → literature review → reproduce baseline → leak-free experiments → honest analysis → paper. Works with Claude / Claude Code, OpenAI Codex, and OpenClaw. Recursive by design: it logs its own mistakes as rules so it never repeats them.

⭐ If this saves you from one leaked-label result, please star the repo — it helps other researchers find it.

Most AI-agent "research" help is a chatbot that sounds confident and cites papers that don't exist. Real research fails in specific, boring, expensive ways: a baseline you quoted instead of ran, a metric that's secretly leaking the label, a "gain" that's really one lucky seed, a citation invented from memory.

This skill encodes the discipline that catches those failures before they cost you a month — as a portable SKILL.md

your agent reads automatically. And it's recursive: when the agent makes a mistake and fixes it, it writes the lesson to LESSONS.md

, reads that file at the start of every future task, and stops repeating itself. The skill you use in month three is sharper than the one you installed.

Stage	What the agent does	Guardrail it enforces
Frame
Turns a vague idea into a testable hypothesis + a stated delta vs prior work	No experiment until the claim is one sentence
Review
Finds the 5–15 papers that matter, builds a comparison matrix	Cite only papers actually read — never from memory
Reproduce
Runs the strongest baseline on your setup first
You need a ruler before you measure a gain
Design
Sets seeds, fixes splits, runs a full leakage audit
Suspiciously-good ≠ breakthrough — prove it's not leakage
Run
Scaffolds configs so every number is reproducible	The config is the single source of truth
Analyze
Compares vs baseline with mean ± std over ≥3 seeds	A single-seed win is a story, not a finding
Write
Backs every claim with a number; ships a reproducibility checklist	Never drop the seed/dataset that hurt the story

Clone it, then drop it where your agent looks for skills:

git clone https://github.com/<your-username>/ai-research-skill.git

Claude / Claude Code / Claude Cowork

Install the folder (or a packaged .skill

bundle) into your skills directory. Claude keeps the name

description

in context always, and loads the full skill when your task looks like AI/ML research. Then just work normally — "help me reproduce this paper's baseline", "why is my F1 suspiciously high?" — and it kicks in.

OpenClaw 🦞

cp -r ai-research-skill ~/.openclaw/workspace/skills/ai-research

OpenClaw reads the same SKILL.md

frontmatter + body, and its injected AGENTS.md

path lands in the same place. Inspect any skill before installing it — treat community skills like npm packages from strangers.

Codex & other AGENTS.md agents

Keep AGENTS.md

at your repo root (it's a thin pointer to SKILL.md

). Codex reads AGENTS.md

and follows the skill from there.

   start task ─▶ read LESSONS.md ─▶ do research ─▶ made a mistake?
        ▲                                              │ yes
        │                                              ▼
        └────────── LESSONS.md now has a new rule ◀── log_lesson.py

When the agent catches an error, it runs:

python scripts/log_lesson.py \
  --trigger "reported a gain from one training run" \
  --mistake "claimed 'beats baseline' from a single seed" \
  --fix     "re-ran 3 seeds; gain was inside the noise band" \
  --rule    "no comparison claim without mean ± std over >=3 seeds" \
  --tags    "seeds,reproducibility"

The script dedupes similar rules, counts repeats, flags a rule for promotion into SKILL.md

once it's seen 3×, and tells you when the log needs pruning. A built-in guardrail refuses any "lesson" that would weaken research integrity (fabricating, hiding, or cherry-picking results). The loop makes the skill smarter — it can't make it dishonest.

ai-research-skill/
├── SKILL.md                       # the skill (source of truth, Claude + OpenClaw)
├── AGENTS.md                      # thin pointer for Codex / OpenClaw
├── LESSONS.md                     # recursive memory — mistakes → rules
├── scripts/
│   ├── log_lesson.py              # append a deduped, counted lesson
│   └── new_experiment.py          # scaffold a reproducible experiment dir
└── references/
    ├── literature-review.md       # find prior art fast; build the matrix
    ├── experiment-design.md       # reproduce, seeds, the leakage audit
    └── paper-writing.md           # claim→evidence + reproducibility checklist

PRs welcome — especially new LESSONS.md

entries from real research mistakes (that's the whole point), new reference playbooks, and adapters for more agents. Open an issue with the failure mode you hit and the rule that fixes it.

AI research agent · machine learning research assistant · Claude skill · Claude Code skill · SKILL.md · Codex AGENTS.md · OpenClaw skill · self-improving agent · recursive AI agent · reproducible ML · data leakage detection · experiment tracking · ablation study · literature review automation · paper writing · LLM research workflow · NLP research · research reproducibility.

MIT — use it, fork it, ship it.

Built for researchers who'd rather find the leak on day one than in review. ⭐ Star it if it helps.

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/recursive-ai-research-sk…

Read original on github.com → github.com/Toadoum/ai-research-skill

mentioned entities

Claude

OpenAI Codex

OpenClaw

Claude Code

GitHub

metadata

slugrecursive-ai-research-skill-for-claude-code-openclaw-codex

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalgithub.com

navigation

← prevFrom Neo4j Fundamentals to Graph…

next →The Vertical Turn

── more in #artificial-intelligence 4 stories · sorted by recency

byteiota.com · 1 Jul · #artificial-intelligence

Vercel AI SDK 7: Production Agents That Survive Deploys

dev.to · 1 Jul · #artificial-intelligence

Stop your AI assistant from hallucinating PDF APIs: SelectPdf AI Tools

latent.space · 1 Jul · #artificial-intelligence

Warp CEO Zach Lloyd on why software factories are the next phase of coding

github.com · 1 Jul · #artificial-intelligence

ECAA-workflow: deterministic workflow compiler for FAIR bioinformatics

── more on @claude 3 stories trending now

wpnews · 30 May · #ai-tools

I was wasting 10 minutes every Claude session. So I built a fix.

wpnews · 27 May · #machine-learning

hunting for headroom on modded-nanoGPT (WR #82)

wpnews · 2 Jun · #ai-products

Microsoft launches Discovery platform for scientific R&D with Ginkgo Bioworks partnership

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required