Agent-ML-skills – Teach Codex/Claude/Cursor to stop making ML mistakes Agent-ML-skills, a curated pack of 15 battle-tested machine learning skills, has been released to teach AI coding agents like Codex, Claude Code, and Cursor how to avoid common ML mistakes such as data leakage and scoring imbalanced data with accuracy. The skills install with a single command and provide expert guidance on tasks from exploratory data analysis to model serving, without bloating prompts. The tool aims to stop agents from guessing and instead work like experienced ML engineers. Production-grade Machine Learning, Data Science & MLOps skills for AI coding agents. Coding agents are great generalists but make the same ML mistakes over and over : leaking preprocessing into cross-validation, scoring imbalanced data with accuracy, forgetting model.eval , building RAG with dense-only retrieval. agent-ml-skills is a curated pack of 15 battle-tested skills that teach your agent how an experienced ML engineer actually works — so it stops guessing. Works with Codex, Claude Code, Cursor, and OpenCode . Install all skills into your agent with one command — no install, no dependencies : Codex npx agent-ml-skills install --target codex Claude Code npx agent-ml-skills install --target claude Cursor npx agent-ml-skills install --target cursor --scope project OpenCode npx agent-ml-skills install --target opencode Everything, everywhere npx agent-ml-skills install --target all Browse what's inside first: npx agent-ml-skills list Then restart your agent or start a new session and it will pick the right skill up automatically when your task matches. A skill is a single Markdown file with YAML frontmatter telling the agent when to use it and how to do the task well: --- name: sklearn-pipelines description: Use when building scikit-learn models that must not leak preprocessing... --- scikit-learn Pipelines ...workflow, code patterns, pitfalls, hand-off... Agents that support skills load the description up front and pull in the full body only when the task matches — so you get expert guidance without bloating every prompt . | Skill | Use when… | |---|---| exploratory-data-analysis | Starting on a new dataset — profiling, distributions, correlations, leakage & viz. | data-cleaning | Handling missing values, duplicates, types, outliers — with train-only imputation. | feature-engineering | Encoding, scaling, datetime/text/aggregation features, leakage-safe target encoding. | pandas-patterns | Writing idiomatic, vectorized, memory-efficient pandas no SettingWithCopyWarning . | imbalanced-data | The target is rare fraud/churn/disease — metrics, SMOTE, class weights, thresholds. | | Skill | Use when… | |---|---| sklearn-pipelines | Building scikit-learn models that must not leak preprocessing into CV. | pytorch-training-loop | Writing/reviewing a PyTorch loop — eval modes, AMP, checkpointing, devices. | model-evaluation | Choosing metrics, validating, calibration, confusion-matrix analysis. | hyperparameter-tuning | Optimizing params — random vs Optuna, leakage-safe CV, early stopping, budget. | | Skill | Use when… | |---|---| llm-finetuning | Fine-tuning an LLM — full vs LoRA/QLoRA, data formatting, transformers/PEFT/TRL. | rag-pipeline | Building RAG — chunking, embeddings, hybrid + reranking retrieval, eval. | | Skill | Use when… | |---|---| experiment-tracking | Experiments need comparing/reproducing — MLflow/W&B, what to log, registry. | reproducible-ml | A result must be reproducible — seeds, env pinning, data versioning, CUDA determinism. | ml-debugging | A model won't learn, loss is NaN, or metrics look too good — a diagnosis decision tree. | model-serving | Deploying behind an API — FastAPI, safe artifact loading, batching, ONNX, monitoring. | npx agent-ml-skills