# Trace-to-Training: how agent runs become learning data

> Source: <https://dev.to/telleroutlook/trace-to-training-how-agent-runs-become-learning-data-31c4>
> Published: 2026-06-26 01:50:39+00:00

Every agent run is a data point. Most frameworks throw it away.

WasmAgent keeps it — evaluated by the compliance engine, ranked by outcome, exported as a typed `ComplianceEvalRecord`

ready for SFT or DPO training. No human labeling.

``` js
import { ComplianceRun } from "@wasmagent/compliance";

const run = new ComplianceRun({
  mode: "full_pcl",   // "direct" | "prompt_retry" | "full_pcl"
  taskSpec: {
    instruction: "Write a summary in exactly 3 bullet points.",
    constraints: [{ type: "format", rule: "bullet_count", value: 3 }],
  },
});

const result = await run.execute(agent, input);
// result.complianceEvalRecord → typed, versioned, schema-validated
```

** direct** — one shot, record pass/fail.

** prompt_retry** — retry once with a rephrased prompt.

** full_pcl** — full repair loop: run → evaluate → patch/regenerate → re-evaluate → record the entire trace.

IFEval × Qwen2.5-1.5B-Q4 (3 seeds × 50 samples):

| Mode | Pass rate | Std dev |
|---|---|---|
| prompt_retry | 46.0% | ±2.0pp |
full_pcl |
54.7% |
±1.2pp |

+8.7pp. The variance drop (±2.0 → ±1.2) matters for production reliability.

Reproduce: `bun packages/compliance/benchmarks/ifeval/run.ts --limit=50 --seed=42`

When `full_pcl`

repairs a failing output, `RepairPlanner`

records every attempt:

```
// Inside ComplianceEvalRecord
attempts: [
  { strategy: "direct",     output: "...", passed: false },
  { strategy: "patch",      output: "...", passed: false },
  { strategy: "regenerate", output: "...", passed: true  },
]
```

The full sequence — what failed, what was tried, what worked — is what feeds DPO training. The model learns from failure traces, not just final outputs.

``` js
import { RolloutForkRunner, RolloutRanker } from "@wasmagent/core";

const runner = new RolloutForkRunner({ forks: 4 });
const rollouts = await runner.run(agent, input, taskSpec);

const ranked = new RolloutRanker().rank(rollouts);
// ranked[0] → chosen (SFT)
// ranked[1..] → rejected (DPO pairs)
```

The compliance verifier is the reward signal. No human annotation.

```
git clone https://github.com/WasmAgent/wasmagent-js
bun test packages/compliance/   # 113 pass / 0 fail
```

**Code:** [packages/compliance](https://github.com/WasmAgent/wasmagent-js/tree/main/packages/compliance) · [RolloutForkRunner](https://github.com/WasmAgent/wasmagent-js/tree/main/packages/core/src/enhancement) · [RolloutRanker](https://github.com/WasmAgent/wasmagent-js/tree/main/packages/core/src/ranking)

*Series: AEP (part 1) · MCP Trust Pack (part 2) · Trace-to-Training (part 3)*
