cd /news/machine-learning/trace-to-training-how-agent-runs-bec… · home topics machine-learning article
[ARTICLE · art-40171] src=dev.to ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Trace-to-Training: how agent runs become learning data

WasmAgent introduces a framework that converts agent execution traces into training data for supervised fine-tuning (SFT) and direct preference optimization (DPO) without human labeling. Its compliance engine evaluates runs, ranks outcomes, and exports typed ComplianceEvalRecords, with a full repair loop (full_pcl) achieving 54.7% pass rate on IFEval benchmarks, an 8.7 percentage point improvement over prompt retry. The system uses compliance verification as the reward signal, enabling models to learn from failure traces.

read2 min views1 publishedJun 26, 2026

Every agent run is a data point. Most frameworks throw it away.

WasmAgent keeps it — evaluated by the compliance engine, ranked by outcome, exported as a typed ComplianceEvalRecord

ready for SFT or DPO training. No human labeling.

import { ComplianceRun } from "@wasmagent/compliance";

const run = new ComplianceRun({
  mode: "full_pcl",   // "direct" | "prompt_retry" | "full_pcl"
  taskSpec: {
    instruction: "Write a summary in exactly 3 bullet points.",
    constraints: [{ type: "format", rule: "bullet_count", value: 3 }],
  },
});

const result = await run.execute(agent, input);
// result.complianceEvalRecord → typed, versioned, schema-validated

** direct** — one shot, record pass/fail.

** prompt_retry** — retry once with a rephrased prompt.

** full_pcl** — full repair loop: run → evaluate → patch/regenerate → re-evaluate → record the entire trace.

IFEval × Qwen2.5-1.5B-Q4 (3 seeds × 50 samples):

Mode Pass rate Std dev
prompt_retry 46.0% ±2.0pp
full_pcl
54.7%
±1.2pp

+8.7pp. The variance drop (±2.0 → ±1.2) matters for production reliability.

Reproduce: bun packages/compliance/benchmarks/ifeval/run.ts --limit=50 --seed=42

When full_pcl

repairs a failing output, RepairPlanner

records every attempt:

// Inside ComplianceEvalRecord
attempts: [
  { strategy: "direct",     output: "...", passed: false },
  { strategy: "patch",      output: "...", passed: false },
  { strategy: "regenerate", output: "...", passed: true  },
]

The full sequence — what failed, what was tried, what worked — is what feeds DPO training. The model learns from failure traces, not just final outputs.

import { RolloutForkRunner, RolloutRanker } from "@wasmagent/core";

const runner = new RolloutForkRunner({ forks: 4 });
const rollouts = await runner.run(agent, input, taskSpec);

const ranked = new RolloutRanker().rank(rollouts);
// ranked[0] → chosen (SFT)
// ranked[1..] → rejected (DPO pairs)

The compliance verifier is the reward signal. No human annotation.

git clone https://github.com/WasmAgent/wasmagent-js
bun test packages/compliance/   # 113 pass / 0 fail

Code: packages/compliance · RolloutForkRunner · RolloutRanker

Series: AEP (part 1) · MCP Trust Pack (part 2) · Trace-to-Training (part 3)

── more in #machine-learning 4 stories · sorted by recency
── more on @wasmagent 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/trace-to-training-ho…] indexed:0 read:2min 2026-06-26 ·