# Show HN: FusionHarness – An Open-source Mixture-of-Agents compound-model server

> Source: <https://github.com/jackulau/fusionHarness>
> Published: 2026-06-17 14:59:07+00:00

Open-source

Mixture-of-Agentscompound-model server — a self-hostable alternative to OpenRouter's Fusion API.

Fan a prompt out to a **panel** of LLMs in parallel, let a **judge** extract the structure of their answers (consensus, contradictions, partial coverage, unique insights), then a **synthesizer** writes one final answer grounded in that analysis. The result beats any single panelist — and a panel of *budget* models can rival a frontier model at a fraction of the cost.

It speaks the **OpenAI API**, so it drops into any existing OpenAI client: point `base_url`

at fusionHarness and use the model slug `fusion`

.

```
          ┌─────────────┐
prompt ─► │   fan-out   │ ─► model A ─┐
          │   (panel)   │ ─► model B ─┤  (parallel, each tool-enabled)
          └─────────────┘ ─► model C ─┘
                                  │
                                  ▼
                            ┌───────────┐     ┌──────────────┐
                            │   judge   │ ──► │ synthesizer  │ ─► final answer
                            │ structure │     │  grounded    │    + cost / latency
                            └───────────┘     └──────────────┘
```

Why it works (OpenRouter's own ablation): ~¾ of the lift comes from

synthesis, ~¼ fromdiversity.

```
# 1. Install
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

# 2. Configure — one OpenRouter key reaches every model in the catalog
cp .env.example .env       # then put your key in FUSION_API_KEY

# 3. Run the OpenAI-compatible server (omit --config to use the built-in budget panel)
fusion serve --config configs/budget.yaml

# 4. Call it like any OpenAI endpoint
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"fusion","messages":[{"role":"user","content":"Compare CRDTs vs OT for collaborative editing."}]}'
```

From the OpenAI Python SDK (`pip install openai`

):

``` python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
resp = client.chat.completions.create(
    model="fusion",
    messages=[{"role": "user", "content": "..."}],
)
print(resp.choices[0].message.content)
```

Or straight from the terminal, no server:

```
export FUSION_API_KEY=sk-or-...
fusion ask "What are the trade-offs between gRPC and REST?" --config configs/budget.yaml
```

Because fusion speaks OpenAI, it drops into any agent harness — or use our own.

```
# Our own TUI harness — streaming, multi-turn chat (/reset /stats /help /exit)
fusion chat --config configs/budget.yaml

# Pi (pi.dev): install the package, register the provider, point Pi at it
pi install ./integrations/pi
bash integrations/pi/install.sh
pi --model fusion
```

Adapters for Pi, Claude Code, aider, Continue, LangChain, and the OpenAI SDK are
in [integrations/](/jackulau/fusionHarness/blob/main/integrations/README.md). Verify the whole stack end-to-end with
no API key:

```
scripts/smoke.sh --fake     # boots a key-free fake backend + the real server
```

fusionHarness is also its own **agentic coding harness** — like Claude Code, but
the brain can convene the fusion panel. The agent reads, writes, and edits files,
searches, and runs bash in a tool-use loop confined to a project root, and can
call `council`

to escalate a hard sub-question to the full panel.

```
fusion code "add a /version endpoint and a test for it" --root .
fusion code                       # interactive agent session
fusion code "refactor X" --plan   # write a plan first, then act
fusion code "delete dead code" --approve   # confirm each file/bash action
```

Each step is printed as it happens; the agent calls `finish`

when the task is
done and verified. Tools are confined to `--root`

(default: cwd). For a hard
sub-problem the agent can call the `council`

tool, which convenes the full fusion
panel and returns a synthesized answer. `--approve`

gates every mutating tool
(write/edit/bash); `--plan`

makes it write a numbered plan before acting.

⚠️ Security:the agent runs bash and edits files. Confinement blocks path escapes, not arbitrary command effects — run it on projects you trust, or in a container.

A config picks the panel, judge, and synthesizer. Two presets ship in `configs/`

:

| Preset | Panel | Use it for |
|---|---|---|
`configs/budget.yaml` |
Gemini 3 Flash · Kimi K2.6 · DeepSeek V4 Pro | frontier-ish quality at ~half the price |
`configs/frontier.yaml` |
Opus 4.8 · GPT-5.5 · Gemini 3.1 Pro | beyond-frontier quality |

Custom panel:

```
# my-panel.yaml
name: fusion
panel:
  - anthropic/claude-opus-4.8
  - openai/gpt-5.5
  - model: deepseek/deepseek-v4-pro    # long form allows per-model overrides
    temperature: 0.3
    tools: [web_search]
judge: openai/gpt-5.5
synthesizer: anthropic/claude-opus-4.8
temperature: 0.7
max_tokens: 4096
tools_enabled: false
```

Model slugs follow OpenRouter conventions (`vendor/model`

). Point at a different
backend with `FUSION_BASE_URL`

(OpenAI, a local vLLM/Ollama server, Groq,
Together — anything OpenAI-compatible). API keys come from the environment only
(`FUSION_API_KEY`

, `OPENROUTER_API_KEY`

, or `OPENAI_API_KEY`

), never from YAML.

All optional config fields (with defaults):

| Key | Default | What it does |
|---|---|---|
`refine` |
`false` |
Run one extra self-critique pass over the synthesized answer (quality ↑, cost ↑). |
`layers` |
`1` |
Multi-layer MoA — with `layers>1` , proposers see the previous layer's drafts and improve before the final synthesis. |
`samples` |
`1` |
Self-consistency — sample each proposer K times so the judge/synthesizer see more drafts. |
`diversity` |
`true` |
Spread panelist temperatures so drafts differ (≈¼ of the lift). |
`diversity_jitter` |
`0.3` |
How wide to spread temperatures (the MoA diversity↔quality trade-off — keep it modest). |
`max_retries` |
`2` |
Retry transient upstream failures (429/5xx/timeout) so a flaky panelist doesn't shrink the panel. |
`retry_backoff` |
`0.5` |
Base seconds for exponential retry backoff. |
`max_concurrency` |
`0` |
Cap concurrent panelist calls (0 = unlimited). |

If the **judge** fails, synthesis still runs from the raw responses; if the
**synthesizer** fails, the best panelist's answer is returned. Anything that
degraded is reported in the response's `fusion.degraded`

list — never silently.

Panelists can call tools while drafting — useful for deep-research tasks. Tools
are **off by default**. Enable globally and per-model:

```
tools_enabled: true
panel:
  - model: deepseek/deepseek-v4-pro
    tools: [web_search, bash]
```

`web_search`

— keyless DuckDuckGo Instant Answer by default; swap in a Tavily/Brave/SerpAPI backend via`default_registry(search_fn=...)`

.`bash`

— runs in a sandboxed shell (timeout, stripped env, output truncation).

⚠️ Security:`bash`

executes commands the model writes. The sandbox is not a container. Run the server in a disposable VM/container before enabling`bash`

with untrusted input. It is opt-in because it is dangerous.

Every response carries the real numbers. Non-streaming responses include a
`fusion`

block plus headers:

```
{
  "choices": [ ... ],
  "usage": { "prompt_tokens": 1234, "completion_tokens": 567, "total_tokens": 1801 },
  "fusion": {
    "config": "fusion",
    "panel_models": ["google/gemini-3-flash", "moonshotai/kimi-k2.6", "deepseek/deepseek-v4-pro"],
    "panel_succeeded": 3,
    "cost_usd": 0.0123,
    "cost_breakdown": [ { "model": "...", "role": "panel", "cost_usd": 0.004 }, ... ],
    "timing_s": { "panel": 2.1, "judge": 0.8, "synth": 3.4, "total": 6.3 }
  }
}
```

Headers: `x-fusion-cost-usd`

, `x-fusion-latency-s`

. When the backend reports an
authoritative per-call cost, that value is used instead of the local price table
(`fusion/pricing.py`

).

Reproduce the panel-vs-solo comparison on DRACO-style weighted tasks (negative criteria penalize wrong claims, so you can't bluff a high score):

```
# deterministic stubs — no API key, proves the pipeline
fusion eval --dry-run

# A/B: solo vs panel vs panel+refine, with deltas vs the best solo
fusion eval --ab --dry-run

# live: grade with an LLM judge (needs an API key)
fusion eval --config configs/budget.yaml --tasks eval/tasks.sample.yaml
Panel vs solo — scored on 3 task(s)

panel+refine                     100.0%  ████████████████████  ★ (+28.6 vs best solo)
panel                             76.2%  ███████████████  ★ (+4.8 vs best solo)
google/gemini-3-flash (solo)      71.4%  ██████████████
```

`--runs N`

repeats each task K times (self-consistency). Add your own tasks in
`eval/tasks.sample.yaml`

(id, prompt, weighted criteria). The dry-run numbers are
from deterministic stubs — **real** lift needs a key; run the live command above.

| Method | Path | Description |
|---|---|---|
`POST` |
`/v1/chat/completions` |
OpenAI-compatible; `stream:true` supported. Model slug `fusion` . |
`GET` |
`/v1/models` |
Lists `fusion` plus the configured panel models. |
`GET` |
`/health` |
Liveness + active config + panel. |

**Per-request overrides.** Customize the panel with per-request overrides (like
OpenRouter Fusion's "pass your own participant models and synthesizer") via a
`fusion`

block in the body. Only safe model-selection/flag keys are honored — the
backend URL and keys can never be set from the request:

```
{
  "model": "fusion",
  "messages": [{"role": "user", "content": "..."}],
  "fusion": {
    "panel": ["anthropic/claude-opus-4.8", "openai/gpt-5.5"],
    "synthesizer": "anthropic/claude-opus-4.8",
    "refine": true,
    "layers": 2
  }
}
fusion/        engine + server + harnesses
               ├─ providers · panel · judge · synthesize · fusion   (MoA engine)
               ├─ tools · server · streaming · schemas · pricing · config
               ├─ tui.py        (fusion chat — TUI harness)
               └─ agent.py · agent_tools.py · cli.py   (fusion code — agent harness)
eval/          DRACO-style evaluation harness (scorer, harness, tasks.sample.yaml)
configs/       panel presets (budget.yaml, frontier.yaml)
integrations/  harness adapters — Pi package, OpenAI-SDK example, adapter guide
scripts/       smoke.sh, verify_install.sh, verify_all.sh
docs/          architecture.md, parity.md
tests/         pytest suite — unit (providers mocked) + real-HTTP e2e (live servers), no API key
pip install -e ".[dev]"
pytest -q          # full suite, no network required
```

See [docs/architecture.md](/jackulau/fusionHarness/blob/main/docs/architecture.md) for the design and
[docs/parity.md](/jackulau/fusionHarness/blob/main/docs/parity.md) for the parity matrix & roadmap.

MIT — see [LICENSE](/jackulau/fusionHarness/blob/main/LICENSE).
