{"slug": "trace-to-training-how-agent-runs-become-learning-data", "title": "Trace-to-Training: how agent runs become learning data", "summary": "WasmAgent introduces a framework that converts agent execution traces into training data for supervised fine-tuning (SFT) and direct preference optimization (DPO) without human labeling. Its compliance engine evaluates runs, ranks outcomes, and exports typed ComplianceEvalRecords, with a full repair loop (full_pcl) achieving 54.7% pass rate on IFEval benchmarks, an 8.7 percentage point improvement over prompt retry. The system uses compliance verification as the reward signal, enabling models to learn from failure traces.", "body_md": "Every agent run is a data point. Most frameworks throw it away.\n\nWasmAgent keeps it — evaluated by the compliance engine, ranked by outcome, exported as a typed `ComplianceEvalRecord`\n\nready for SFT or DPO training. No human labeling.\n\n``` js\nimport { ComplianceRun } from \"@wasmagent/compliance\";\n\nconst run = new ComplianceRun({\n  mode: \"full_pcl\",   // \"direct\" | \"prompt_retry\" | \"full_pcl\"\n  taskSpec: {\n    instruction: \"Write a summary in exactly 3 bullet points.\",\n    constraints: [{ type: \"format\", rule: \"bullet_count\", value: 3 }],\n  },\n});\n\nconst result = await run.execute(agent, input);\n// result.complianceEvalRecord → typed, versioned, schema-validated\n```\n\n** direct** — one shot, record pass/fail.\n\n** prompt_retry** — retry once with a rephrased prompt.\n\n** full_pcl** — full repair loop: run → evaluate → patch/regenerate → re-evaluate → record the entire trace.\n\nIFEval × Qwen2.5-1.5B-Q4 (3 seeds × 50 samples):\n\n| Mode | Pass rate | Std dev |\n|---|---|---|\n| prompt_retry | 46.0% | ±2.0pp |\nfull_pcl |\n54.7% |\n±1.2pp |\n\n+8.7pp. The variance drop (±2.0 → ±1.2) matters for production reliability.\n\nReproduce: `bun packages/compliance/benchmarks/ifeval/run.ts --limit=50 --seed=42`\n\nWhen `full_pcl`\n\nrepairs a failing output, `RepairPlanner`\n\nrecords every attempt:\n\n```\n// Inside ComplianceEvalRecord\nattempts: [\n  { strategy: \"direct\",     output: \"...\", passed: false },\n  { strategy: \"patch\",      output: \"...\", passed: false },\n  { strategy: \"regenerate\", output: \"...\", passed: true  },\n]\n```\n\nThe full sequence — what failed, what was tried, what worked — is what feeds DPO training. The model learns from failure traces, not just final outputs.\n\n``` js\nimport { RolloutForkRunner, RolloutRanker } from \"@wasmagent/core\";\n\nconst runner = new RolloutForkRunner({ forks: 4 });\nconst rollouts = await runner.run(agent, input, taskSpec);\n\nconst ranked = new RolloutRanker().rank(rollouts);\n// ranked[0] → chosen (SFT)\n// ranked[1..] → rejected (DPO pairs)\n```\n\nThe compliance verifier is the reward signal. No human annotation.\n\n```\ngit clone https://github.com/WasmAgent/wasmagent-js\nbun test packages/compliance/   # 113 pass / 0 fail\n```\n\n**Code:** [packages/compliance](https://github.com/WasmAgent/wasmagent-js/tree/main/packages/compliance) · [RolloutForkRunner](https://github.com/WasmAgent/wasmagent-js/tree/main/packages/core/src/enhancement) · [RolloutRanker](https://github.com/WasmAgent/wasmagent-js/tree/main/packages/core/src/ranking)\n\n*Series: AEP (part 1) · MCP Trust Pack (part 2) · Trace-to-Training (part 3)*", "url": "https://wpnews.pro/news/trace-to-training-how-agent-runs-become-learning-data", "canonical_source": "https://dev.to/telleroutlook/trace-to-training-how-agent-runs-become-learning-data-31c4", "published_at": "2026-06-26 01:50:39+00:00", "updated_at": "2026-06-26 02:03:33.874418+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "developer-tools", "ai-agents"], "entities": ["WasmAgent", "ComplianceEvalRecord", "IFEval", "Qwen2.5-1.5B-Q4", "RolloutForkRunner", "RolloutRanker", "RepairPlanner", "ComplianceRun"], "alternates": {"html": "https://wpnews.pro/news/trace-to-training-how-agent-runs-become-learning-data", "markdown": "https://wpnews.pro/news/trace-to-training-how-agent-runs-become-learning-data.md", "text": "https://wpnews.pro/news/trace-to-training-how-agent-runs-become-learning-data.txt", "jsonld": "https://wpnews.pro/news/trace-to-training-how-agent-runs-become-learning-data.jsonld"}}