# Ruflo × MetaHarness Integration: OIA-Layered Walkthrough (ADR-150 implementation across 28 /loop iterations)

> Source: <https://gist.github.com/ruvnet/9056701d13d5a5b5148d0459ff10b7c3>
> Published: 2026-06-16 19:43:54+00:00

**Plain-language guide to what we built across 28 iterations of /loop work.**

*Companion to ADR-150 · Tracking issue #2399 · Upstream bug #9*

**OIA = Open Infrastructure Architecture.** It's a 9-layer mental model for what makes an AI-agent platform work. Like the OSI model for networking, but for agent harnesses. Each layer answers one question:

| Layer | The question it answers |
|---|---|
| L1 — Physical Compute | Where does the work physically run? |
| L2 — Data and Storage | Where do memories and artifacts live? |
| L3 — Models | Which language models are available? |
| L4 — Tools and Integrations | What can agents reach out and touch? |
| L5 — Agent Orchestration | How do agents coordinate? |
| L6 — Workflow and Automation | What runs automatically without a human? |
| L7 — Governance and Policy | What's allowed, and what's blocked? |
| L8 — Observability and Audit | What can we see after the fact? |
| L9 — Human and Browser Interface | How do humans (and other systems) interact? |

Plus six "horizontal spans" that cut across every layer: **security, observability, identity, governance, policy enforcement, interoperability.**

The metaharness CLI emits an OIA manifest for any repo: `harness oia-manifest .`

reports each layer as `full`

, `partial`

, `none`

, or `not-applicable`

. That's the standard we'll measure against.

```
L1 Physical Compute         not-applicable    (cloud-agnostic; runs anywhere Node 20+ does)
L2 Data and Storage         partial           (AgentDB + HNSW vectors + memory namespaces)
L3 Models                   FULL              (haiku/sonnet/opus + OpenRouter alts + KRR routing)
L4 Tools and Integrations   partial           (314 MCP tools registered, MCP policy missing)
L5 Agent Orchestration      FULL              (16 agent roles, swarm + hive-mind + claims)
L6 Workflow and Automation  partial           (17 hooks + 12 background workers + cron)
L7 Governance and Policy    partial           (mcp-policy.json declared, witness missing)
L8 Observability and Audit  partial           (cost-tracker stack, audit log not enabled)
L9 Human and Browser UI     partial           (CLI mature, no dashboard yet)
```

That's the WHOLE-RUFLO picture. The ADR-150 work shipped this iteration improves several layers — let's walk through which ones.

Ruflo already had a rule-based 3-tier router (haiku for simple, sonnet for medium, opus for hard). ADR-148/149 wanted a **learned** router that picks the cheapest model that still meets a quality bar. The implementation has three parts:

-
**The router itself**—`@metaharness/router`

is an external library that does k-NN over a seed corpus, KRR (kernel ridge regression) with cross-validated lambda, and optional native FastGRNN. Ruflo loads it lazily behind a triple gate (env flag + artifact + import success). When it's available, the routing decision carries`routedBy: 'metaharness-knn' | 'metaharness-krr' | 'fastgrnn'`

. -
**Parallel decision logging**(iters 10–12). Every time the bandit (Thompson sampling) picks a model, the SelfEvolvingRouter (a competing arm) also picks one. Both picks + the actual outcome go to`.swarm/router-parallel.jsonl`

. Only writes when`CLAUDE_FLOW_ROUTER_PARALLEL_LOG=1`

.**Default-path overhead measured at 147ns per route() call**— imperceptible. -
**Promotion gate**(iter 10). An analyzer reads the JSONL and decides whether to promote SER over the bandit. The gate is conservative on purpose — three criteria, all must pass:

```
quality improvement > 2%   AND   cost increase < 1%   AND   p95 latency increase < 5%
```

The

`AND`

matters: a quality gain that comes with a cost regression doesn't count. (This came from review-round-1 of the ADR; the original draft had`OR`

, which could have hidden cost regressions behind quality wins.)

**Why it matters:** routing is the most cost-sensitive decision the platform makes. A bad router quietly burns money for a long time before anyone notices. A learned router that's measurably better — not just believed to be — is real money.

ADR-150's static-analysis surface is now reachable from Claude Code agents as first-class MCP tools:

```
mcp__claude-flow__metaharness_score          "How harness-ready is this repo?"
mcp__claude-flow__metaharness_genome         "What kind of repo is this, structurally?"
mcp__claude-flow__metaharness_mcp_scan       "Any MCP-config security issues?"
mcp__claude-flow__metaharness_threat_model   "Worst threat in our agent surface?"
mcp__claude-flow__metaharness_oia_audit      "Snapshot all of the above to memory"
mcp__claude-flow__metaharness_audit_list     "What audits exist?"
mcp__claude-flow__metaharness_audit_trend    "Has anything gotten worse since last audit?"
```

These didn't exist before iter 20. Now an agent can ask "should this repo become a custom harness?" and get a quantitative answer from inside its normal tool-use loop — no shell-out to a separate CLI required.

**Verification depth:** the iter-23 runtime test invokes every handler with minimal input and asserts each returns `{success, data, degraded, exitCode}`

without throwing. 65 assertions across 7 tools. CI runs this on every PR via `metaharness-ci.yml`

.

The `npx ruflo eject`

command is the most user-visible new capability. It takes the calling ruflo project and lifts it into a renamed standalone harness. The metaharness CLI does the heavy lifting (`metaharness --from-existing`

), ruflo provides safety gates:

**Dry-run by default.** Just shows the plan. No`--confirm`

, no writes.**Refuses to write to the calling repo.** Default target is`/tmp/ruflo-eject-<ts>-<name>/`

. If you pass`--target`

and it lands inside the repo, the command refuses with exit 2.**Refuses to overwrite.** Target dir must not exist.**10-minute hard timeout.** No hung subprocesses.

Why it matters: this is the user's exit ramp. Without an eject command, adopting ruflo feels like a one-way decision. With it, the message is "use ruflo, and if you outgrow it, take a focused version with you." Paradoxically, the exit ramp increases willingness to commit.

Iters 7, 8, 15, 16 built a complete audit pipeline that runs on its own:

bundles three orthogonal static checks —`oia-audit`

`oia-manifest`

,`threat-model`

,`mcp-scan`

— into one timestamped record with a composite worst severity. Writes to the`metaharness-audit`

memory namespace.**Weekly cron**(`.github/workflows/oia-audit-weekly.yml`

) fires the audit every Sunday at 04:17 UTC. Artifacts kept 90 days. Fails the workflow if composite worst hits HIGH.enumerates records. Show me the last 10, the last 30 days, etc.`audit-list`

diffs any two records. Surfaces the composite-severity delta, per-component status drift, findings introduced vs cleared.`audit-trend`

The shape mirrors ruflo's existing cost-tracker observability (track → list → diff). Same mental model: snapshot continuously, compare on demand.

This is the most important contribution of the whole ADR. It's not a feature — it's a rule that every feature has to honor:

Ruflo remains operational if every MetaHarness package is removed.

Four enforcement mechanisms:

**Removable.**`npm ls --without @metaharness/*`

must still produce a working CLI. Verified by static dep-grep on every PR.**Optional in package.json.** Every`@metaharness/*`

is in`optionalDependencies`

, never`dependencies`

. Static grep on every PR.**Graceful degradation.** Every code path that touches MetaHarness catches`MODULE_NOT_FOUND`

and emits a structured`{degraded: true}`

payload. Smoke greps assert this in source.**CI gate.**`.github/workflows/no-metaharness-smoke.yml`

simulates an unresolvable npm registry, runs every skill, and asserts each exits 0 + emits degraded JSON. Catches anything the static checks miss.

Why this matters: if you don't write this rule down explicitly, integrations slowly turn into hard dependencies. Tomorrow a clever optimization "just for performance", next year a kernel that "really should be present". The rule freezes a permanent boundary: MetaHarness is augmentation, not foundation.

ADR-150 has the **deepest verification coverage of any feature in ruflo**:

**Compile-time**—`tsc`

clean on every TS source file**Structural**— smoke contract greps source for safe patterns (35 invariants in the metaharness plugin)** Compat tripwire**—`scripts/check-metaharness-compat.mjs`

exercises the upstream API surface (9/9 against current router; catches breaking changes BEFORE we publish a release that would break at runtime)**End-to-end pipeline**—`test-parallel-pipeline.mjs`

(25 assertions): recorder JSON shape → analyzer reads → 3-criteria AND-gate evaluates → strict-mode exit code matches verdict**MCP runtime contract**—`test-mcp-tools.mjs`

(65 assertions): every tool handler invoked, returns`{success, data, degraded, exitCode}`

without throwing**Graceful-degradation drill**— points each skill at an unresolvable registry, asserts exit 0 + degraded JSON** GCP-secret × OpenRouter × scaffold × lifecycle**—`test-with-openrouter.mjs`

(11 assertions): fetch secret, auth against OpenRouter (337 models verified), scaffold a harness, run doctor + score + genome + mcp-scan, cleanup**Performance**—`bench-recordpair-overhead.mjs`

measures the iter-12 dispatch overhead at 147ns/call and gates regressions at 500ns

Every claim in the implementation has either compile-time proof, structural proof, runtime proof, measured proof, or cross-cutting proof.

ADR-150 is visible at every standard ruflo entry point:

`npx ruflo init`

Next-steps tip recommends running`metaharness score`

`npx ruflo doctor`

lists`MetaHarness (ADR-150)`

as a check with version + install hint`npx ruflo metaharness <subcommand>`

dispatches all 8 subcommands (score, genome, mcp-scan, threat-model, oia-audit, audit-list, audit-trend, mint)`npx ruflo eject`

is its own top-level command`npx ruflo plugins list --type harness`

filters the registry to MetaHarness-generated harnesses`npx claude-flow hooks worker dispatch`

help text mentions`oia-audit`

alongside the 12 original workers- Project CLAUDE.md has a comprehensive section so future agents discover the integration on file load

Three days into the integration, the test from "can we test the harnesses using the OpenRouter token from GCP secret?" found a 26-iteration-old bug masked by our own graceful-degradation path.

**The bug:** `_harness.mjs::runMetaharness`

was passing `'-y metaharness@latest'`

as a single argv element to `npx`

via `spawnSync('npx', [bin, ...argv])`

. `spawnSync`

with `shell:false`

doesn't split on whitespace, so `npx`

received one argument containing an embedded space. Every skill that used the shared bridge was silently degrading instead of running.

**Why it stayed hidden:** ADR-150's architectural-constraint rule #3 says skills must emit `{degraded: true}`

and exit 0 when MetaHarness isn't available. The buggy invocation triggered that path. Every smoke check passed; every test reported success; every CI workflow stayed green. The system was lying to itself.

**What broke the lie:** an integration test that exercised the real path end-to-end and verified a real outcome. `score`

against ruflo returned the actual `harnessFit: 82`

numbers from the upstream CLI, not the degraded payload.

**The lesson:** graceful degradation is necessary, but it MUST be paired with proof that the non-degraded path also works. The iter-19 drill (assert degraded:true when metaharness is genuinely absent) wasn't enough on its own. The iter-26 drill (assert real-data:true when metaharness IS present) was the missing half.

Both halves are now wired into CI.

Three real workflows the integration enables:

```
npx ruflo metaharness score /path/to/repo
```

Returns a scorecard in 500ms. `harnessFit ≥ 70`

and `scaffoldReady: true`

means "yes, you've got the bones". `recommendedMode`

tells you whether CLI-only or CLI+MCP makes sense. `template`

recommends the closest pre-built vertical (legal, devops, support, etc.).

```
npx ruflo eject --name my-harness               # dry-run first — see the plan
npx ruflo eject --name my-harness --confirm     # then commit
cd /tmp/ruflo-eject-<ts>-my-harness
npm install
npx harness doctor
```

You get a renamed standalone harness with attribution preserved. Take it, modify it, publish it, never look at ruflo again — or stay and use both.

```
# In CI, weekly:
node plugins/ruflo-metaharness/scripts/oia-audit.mjs --alert-on-worst high

# Locally, ad-hoc:
npx ruflo metaharness audit-list --since 30d
npx ruflo metaharness audit-trend --baseline-key <a> --current-key <b> --alert-on-worsening
```

You see drift in the MCP threat model, find findings introduced or cleared, and the worst-severity verdict tells you whether the trajectory is OK.

**KRR retrain from production trajectories**— wired but data-blocked. Needs 50+ real routing decisions; pipeline ready to consume the data when we have it.— touching 314 tools at v0.1.0 of an upstream package is too high blast radius. Deferred to a follow-up ADR.`@metaharness/kernel`

ToolDispatcher as primary MCP dispatch**Phase 3 Harness Intelligence Layer**— explicitly scope-only in ADR-150: genome similarity search, harness recommendation, fleet drift, cross-harness capability graph, plugin compatibility. Each needs its own ADR.**Direct merge to main**— work is on`feat/metaharness-integration-research`

branch, 28 iterations of commits, ready for review.

```
Branch:              feat/metaharness-integration-research
Iterations of /loop: 28
Commits:             ~28
Net new files:       ~20 (plugin + tests + scripts + CI + ADR)
Plugin skills:       6 (score, genome, mcp-scan, threat-model, oia-audit, mint)
CLI subcommands:     8 (5 above + audit-list + audit-trend + mint)
MCP tools:           7 (same 5 + audit-list + audit-trend)
CI workflows:        3 dedicated + integration with v3-ci.yml
Smoke invariants:    35 (in the plugin) + 422+ fleet-wide
Verification layers: 7 (tsc, structural, compat, e2e pipeline, MCP runtime,
                        graceful drill, GCP-OpenRouter integration)
Default-path overhead measured: 147ns per route() call
Architectural-constraint violations to date: 0
Upstream bugs filed: 1 (--target flag ignored)
```

The integration is complete enough that the only outstanding work needs production data, not more code.

**Decision**:`v3/docs/adr/ADR-150-metaharness-integration-surfaces.md`

**Plan**:[Issue #2399 — phase tracker](https://github.com/ruvnet/ruflo/issues/2399)** Research dossier**:[original gist](https://gist.github.com/ruvnet/19d166ff9acf368c9da4172d91ac9113)(graded evidence per claim)** Upstream bug**:`ruvnet/agent-harness-generator#9`

**Branch**:`feat/metaharness-integration-research`

**MetaHarness upstream**:`ruvnet/agent-harness-generator`
