Ruflo × MetaHarness Integration: OIA-Layered Walkthrough (ADR-150 implementation across 28 /loop iterations)

Ruflo and MetaHarness have integrated an OIA-layered walkthrough across 28 iterations, implementing ADR-150 to improve AI-agent platform routing. The new learned router uses k-NN, kernel ridge regression, and FastGRNN to select the cheapest model meeting a quality bar, with parallel decision logging and a conservative promotion gate requiring >2% quality improvement, <1% cost increase, and <5% p95 latency increase. The OIA manifest reports layer statuses from full to not-applicable, with the router enhancing L3 Models and L5 Agent Orchestration.

Plain-language guide to what we built across 28 iterations of /loop work. Companion to ADR-150 · Tracking issue 2399 · Upstream bug 9 OIA = Open Infrastructure Architecture. It's a 9-layer mental model for what makes an AI-agent platform work. Like the OSI model for networking, but for agent harnesses. Each layer answers one question: | Layer | The question it answers | |---|---| | L1 — Physical Compute | Where does the work physically run? | | L2 — Data and Storage | Where do memories and artifacts live? | | L3 — Models | Which language models are available? | | L4 — Tools and Integrations | What can agents reach out and touch? | | L5 — Agent Orchestration | How do agents coordinate? | | L6 — Workflow and Automation | What runs automatically without a human? | | L7 — Governance and Policy | What's allowed, and what's blocked? | | L8 — Observability and Audit | What can we see after the fact? | | L9 — Human and Browser Interface | How do humans and other systems interact? | Plus six "horizontal spans" that cut across every layer: security, observability, identity, governance, policy enforcement, interoperability. The metaharness CLI emits an OIA manifest for any repo: harness oia-manifest . reports each layer as full , partial , none , or not-applicable . That's the standard we'll measure against. L1 Physical Compute not-applicable cloud-agnostic; runs anywhere Node 20+ does L2 Data and Storage partial AgentDB + HNSW vectors + memory namespaces L3 Models FULL haiku/sonnet/opus + OpenRouter alts + KRR routing L4 Tools and Integrations partial 314 MCP tools registered, MCP policy missing L5 Agent Orchestration FULL 16 agent roles, swarm + hive-mind + claims L6 Workflow and Automation partial 17 hooks + 12 background workers + cron L7 Governance and Policy partial mcp-policy.json declared, witness missing L8 Observability and Audit partial cost-tracker stack, audit log not enabled L9 Human and Browser UI partial CLI mature, no dashboard yet That's the WHOLE-RUFLO picture. The ADR-150 work shipped this iteration improves several layers — let's walk through which ones. Ruflo already had a rule-based 3-tier router haiku for simple, sonnet for medium, opus for hard . ADR-148/149 wanted a learned router that picks the cheapest model that still meets a quality bar. The implementation has three parts: - The router itself — @metaharness/router is an external library that does k-NN over a seed corpus, KRR kernel ridge regression with cross-validated lambda, and optional native FastGRNN. Ruflo loads it lazily behind a triple gate env flag + artifact + import success . When it's available, the routing decision carries routedBy: 'metaharness-knn' | 'metaharness-krr' | 'fastgrnn' . - Parallel decision logging iters 10–12 . Every time the bandit Thompson sampling picks a model, the SelfEvolvingRouter a competing arm also picks one. Both picks + the actual outcome go to .swarm/router-parallel.jsonl . Only writes when CLAUDE FLOW ROUTER PARALLEL LOG=1 . Default-path overhead measured at 147ns per route call — imperceptible. - Promotion gate iter 10 . An analyzer reads the JSONL and decides whether to promote SER over the bandit. The gate is conservative on purpose — three criteria, all must pass: quality improvement 2% AND cost increase < 1% AND p95 latency increase < 5% The AND matters: a quality gain that comes with a cost regression doesn't count. This came from review-round-1 of the ADR; the original draft had OR , which could have hidden cost regressions behind quality wins. Why it matters: routing is the most cost-sensitive decision the platform makes. A bad router quietly burns money for a long time before anyone notices. A learned router that's measurably better — not just believed to be — is real money. ADR-150's static-analysis surface is now reachable from Claude Code agents as first-class MCP tools: mcp claude-flow metaharness score "How harness-ready is this repo?" mcp claude-flow metaharness genome "What kind of repo is this, structurally?" mcp claude-flow metaharness mcp scan "Any MCP-config security issues?" mcp claude-flow metaharness threat model "Worst threat in our agent surface?" mcp claude-flow metaharness oia audit "Snapshot all of the above to memory" mcp claude-flow metaharness audit list "What audits exist?" mcp claude-flow metaharness audit trend "Has anything gotten worse since last audit?" These didn't exist before iter 20. Now an agent can ask "should this repo become a custom harness?" and get a quantitative answer from inside its normal tool-use loop — no shell-out to a separate CLI required. Verification depth: the iter-23 runtime test invokes every handler with minimal input and asserts each returns {success, data, degraded, exitCode} without throwing. 65 assertions across 7 tools. CI runs this on every PR via metaharness-ci.yml . The npx ruflo eject command is the most user-visible new capability. It takes the calling ruflo project and lifts it into a renamed standalone harness. The metaharness CLI does the heavy lifting metaharness --from-existing , ruflo provides safety gates: Dry-run by default. Just shows the plan. No --confirm , no writes. Refuses to write to the calling repo. Default target is /tmp/ruflo-eject-<ts -<name / . If you pass --target and it lands inside the repo, the command refuses with exit 2. Refuses to overwrite. Target dir must not exist. 10-minute hard timeout. No hung subprocesses. Why it matters: this is the user's exit ramp. Without an eject command, adopting ruflo feels like a one-way decision. With it, the message is "use ruflo, and if you outgrow it, take a focused version with you." Paradoxically, the exit ramp increases willingness to commit. Iters 7, 8, 15, 16 built a complete audit pipeline that runs on its own: bundles three orthogonal static checks — oia-audit oia-manifest , threat-model , mcp-scan — into one timestamped record with a composite worst severity. Writes to the metaharness-audit memory namespace. Weekly cron .github/workflows/oia-audit-weekly.yml fires the audit every Sunday at 04:17 UTC. Artifacts kept 90 days. Fails the workflow if composite worst hits HIGH.enumerates records. Show me the last 10, the last 30 days, etc. audit-list diffs any two records. Surfaces the composite-severity delta, per-component status drift, findings introduced vs cleared. audit-trend The shape mirrors ruflo's existing cost-tracker observability track → list → diff . Same mental model: snapshot continuously, compare on demand. This is the most important contribution of the whole ADR. It's not a feature — it's a rule that every feature has to honor: Ruflo remains operational if every MetaHarness package is removed. Four enforcement mechanisms: Removable. npm ls --without @metaharness/ must still produce a working CLI. Verified by static dep-grep on every PR. Optional in package.json. Every @metaharness/ is in optionalDependencies , never dependencies . Static grep on every PR. Graceful degradation. Every code path that touches MetaHarness catches MODULE NOT FOUND and emits a structured {degraded: true} payload. Smoke greps assert this in source. CI gate. .github/workflows/no-metaharness-smoke.yml simulates an unresolvable npm registry, runs every skill, and asserts each exits 0 + emits degraded JSON. Catches anything the static checks miss. Why this matters: if you don't write this rule down explicitly, integrations slowly turn into hard dependencies. Tomorrow a clever optimization "just for performance", next year a kernel that "really should be present". The rule freezes a permanent boundary: MetaHarness is augmentation, not foundation. ADR-150 has the deepest verification coverage of any feature in ruflo : Compile-time — tsc clean on every TS source file Structural — smoke contract greps source for safe patterns 35 invariants in the metaharness plugin Compat tripwire — scripts/check-metaharness-compat.mjs exercises the upstream API surface 9/9 against current router; catches breaking changes BEFORE we publish a release that would break at runtime End-to-end pipeline — test-parallel-pipeline.mjs 25 assertions : recorder JSON shape → analyzer reads → 3-criteria AND-gate evaluates → strict-mode exit code matches verdict MCP runtime contract — test-mcp-tools.mjs 65 assertions : every tool handler invoked, returns {success, data, degraded, exitCode} without throwing Graceful-degradation drill — points each skill at an unresolvable registry, asserts exit 0 + degraded JSON GCP-secret × OpenRouter × scaffold × lifecycle — test-with-openrouter.mjs 11 assertions : fetch secret, auth against OpenRouter 337 models verified , scaffold a harness, run doctor + score + genome + mcp-scan, cleanup Performance — bench-recordpair-overhead.mjs measures the iter-12 dispatch overhead at 147ns/call and gates regressions at 500ns Every claim in the implementation has either compile-time proof, structural proof, runtime proof, measured proof, or cross-cutting proof. ADR-150 is visible at every standard ruflo entry point: npx ruflo init Next-steps tip recommends running metaharness score npx ruflo doctor lists MetaHarness ADR-150 as a check with version + install hint npx ruflo metaharness <subcommand dispatches all 8 subcommands score, genome, mcp-scan, threat-model, oia-audit, audit-list, audit-trend, mint npx ruflo eject is its own top-level command npx ruflo plugins list --type harness filters the registry to MetaHarness-generated harnesses npx claude-flow hooks worker dispatch help text mentions oia-audit alongside the 12 original workers- Project CLAUDE.md has a comprehensive section so future agents discover the integration on file load Three days into the integration, the test from "can we test the harnesses using the OpenRouter token from GCP secret?" found a 26-iteration-old bug masked by our own graceful-degradation path. The bug: harness.mjs::runMetaharness was passing '-y metaharness@latest' as a single argv element to npx via spawnSync 'npx', bin, ...argv . spawnSync with shell:false doesn't split on whitespace, so npx received one argument containing an embedded space. Every skill that used the shared bridge was silently degrading instead of running. Why it stayed hidden: ADR-150's architectural-constraint rule 3 says skills must emit {degraded: true} and exit 0 when MetaHarness isn't available. The buggy invocation triggered that path. Every smoke check passed; every test reported success; every CI workflow stayed green. The system was lying to itself. What broke the lie: an integration test that exercised the real path end-to-end and verified a real outcome. score against ruflo returned the actual harnessFit: 82 numbers from the upstream CLI, not the degraded payload. The lesson: graceful degradation is necessary, but it MUST be paired with proof that the non-degraded path also works. The iter-19 drill assert degraded:true when metaharness is genuinely absent wasn't enough on its own. The iter-26 drill assert real-data:true when metaharness IS present was the missing half. Both halves are now wired into CI. Three real workflows the integration enables: npx ruflo metaharness score /path/to/repo Returns a scorecard in 500ms. harnessFit ≥ 70 and scaffoldReady: true means "yes, you've got the bones". recommendedMode tells you whether CLI-only or CLI+MCP makes sense. template recommends the closest pre-built vertical legal, devops, support, etc. . npx ruflo eject --name my-harness dry-run first — see the plan npx ruflo eject --name my-harness --confirm then commit cd /tmp/ruflo-eject-<ts -my-harness npm install npx harness doctor You get a renamed standalone harness with attribution preserved. Take it, modify it, publish it, never look at ruflo again — or stay and use both. In CI, weekly: node plugins/ruflo-metaharness/scripts/oia-audit.mjs --alert-on-worst high Locally, ad-hoc: npx ruflo metaharness audit-list --since 30d npx ruflo metaharness audit-trend --baseline-key <a --current-key <b --alert-on-worsening You see drift in the MCP threat model, find findings introduced or cleared, and the worst-severity verdict tells you whether the trajectory is OK. KRR retrain from production trajectories — wired but data-blocked. Needs 50+ real routing decisions; pipeline ready to consume the data when we have it.— touching 314 tools at v0.1.0 of an upstream package is too high blast radius. Deferred to a follow-up ADR. @metaharness/kernel ToolDispatcher as primary MCP dispatch Phase 3 Harness Intelligence Layer — explicitly scope-only in ADR-150: genome similarity search, harness recommendation, fleet drift, cross-harness capability graph, plugin compatibility. Each needs its own ADR. Direct merge to main — work is on feat/metaharness-integration-research branch, 28 iterations of commits, ready for review. Branch: feat/metaharness-integration-research Iterations of /loop: 28 Commits: ~28 Net new files: ~20 plugin + tests + scripts + CI + ADR Plugin skills: 6 score, genome, mcp-scan, threat-model, oia-audit, mint CLI subcommands: 8 5 above + audit-list + audit-trend + mint MCP tools: 7 same 5 + audit-list + audit-trend CI workflows: 3 dedicated + integration with v3-ci.yml Smoke invariants: 35 in the plugin + 422+ fleet-wide Verification layers: 7 tsc, structural, compat, e2e pipeline, MCP runtime, graceful drill, GCP-OpenRouter integration Default-path overhead measured: 147ns per route call Architectural-constraint violations to date: 0 Upstream bugs filed: 1 --target flag ignored The integration is complete enough that the only outstanding work needs production data, not more code. Decision : v3/docs/adr/ADR-150-metaharness-integration-surfaces.md Plan : Issue 2399 — phase tracker https://github.com/ruvnet/ruflo/issues/2399 Research dossier : original gist https://gist.github.com/ruvnet/19d166ff9acf368c9da4172d91ac9113 graded evidence per claim Upstream bug : ruvnet/agent-harness-generator 9 Branch : feat/metaharness-integration-research MetaHarness upstream : ruvnet/agent-harness-generator