cd /news/ai-agents/show-hn-fusionharness-an-open-source… · home topics ai-agents article
[ARTICLE · art-31291] src=github.com ↗ pub= topic=ai-agents verified=true sentiment=↑ positive

Show HN: FusionHarness – An Open-source Mixture-of-Agents compound-model server

FusionHarness, an open-source mixture-of-agents compound-model server, was released as a self-hostable alternative to OpenRouter's Fusion API. It fans out prompts to a panel of LLMs in parallel, uses a judge to extract structure, and a synthesizer to produce a final answer, outperforming any single model and rivaling frontier models at lower cost. The server speaks the OpenAI API, enabling integration with existing clients and agent harnesses.

read7 min views1 publishedJun 17, 2026

Open-source

Mixture-of-Agentscompound-model server — a self-hostable alternative to OpenRouter's Fusion API.

Fan a prompt out to a panel of LLMs in parallel, let a judge extract the structure of their answers (consensus, contradictions, partial coverage, unique insights), then a synthesizer writes one final answer grounded in that analysis. The result beats any single panelist — and a panel of budget models can rival a frontier model at a fraction of the cost.

It speaks the OpenAI API, so it drops into any existing OpenAI client: point base_url

at fusionHarness and use the model slug fusion

.

          ┌─────────────┐
prompt ─► │   fan-out   │ ─► model A ─┐
          │   (panel)   │ ─► model B ─┤  (parallel, each tool-enabled)
          └─────────────┘ ─► model C ─┘
                                  │
                                  ▼
                            ┌───────────┐     ┌──────────────┐
                            │   judge   │ ──► │ synthesizer  │ ─► final answer
                            │ structure │     │  grounded    │    + cost / latency
                            └───────────┘     └──────────────┘

Why it works (OpenRouter's own ablation): ~¾ of the lift comes from

synthesis, ~¼ fromdiversity.

python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

cp .env.example .env       # then put your key in FUSION_API_KEY

fusion serve --config configs/budget.yaml

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"fusion","messages":[{"role":"user","content":"Compare CRDTs vs OT for collaborative editing."}]}'

From the OpenAI Python SDK (pip install openai

):

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
resp = client.chat.completions.create(
    model="fusion",
    messages=[{"role": "user", "content": "..."}],
)
print(resp.choices[0].message.content)

Or straight from the terminal, no server:

export FUSION_API_KEY=sk-or-...
fusion ask "What are the trade-offs between gRPC and REST?" --config configs/budget.yaml

Because fusion speaks OpenAI, it drops into any agent harness — or use our own.

fusion chat --config configs/budget.yaml

pi install ./integrations/pi
bash integrations/pi/install.sh
pi --model fusion

Adapters for Pi, Claude Code, aider, Continue, LangChain, and the OpenAI SDK are in integrations/. Verify the whole stack end-to-end with no API key:

scripts/smoke.sh --fake     # boots a key-free fake backend + the real server

fusionHarness is also its own agentic coding harness — like Claude Code, but the brain can convene the fusion panel. The agent reads, writes, and edits files, searches, and runs bash in a tool-use loop confined to a project root, and can call council

to escalate a hard sub-question to the full panel.

fusion code "add a /version endpoint and a test for it" --root .
fusion code                       # interactive agent session
fusion code "refactor X" --plan   # write a plan first, then act
fusion code "delete dead code" --approve   # confirm each file/bash action

Each step is printed as it happens; the agent calls finish

when the task is done and verified. Tools are confined to --root

(default: cwd). For a hard sub-problem the agent can call the council

tool, which convenes the full fusion panel and returns a synthesized answer. --approve

gates every mutating tool (write/edit/bash); --plan

makes it write a numbered plan before acting.

⚠️ Security:the agent runs bash and edits files. Confinement blocks path escapes, not arbitrary command effects — run it on projects you trust, or in a container.

A config picks the panel, judge, and synthesizer. Two presets ship in configs/

:

Preset Panel Use it for
configs/budget.yaml
Gemini 3 Flash · Kimi K2.6 · DeepSeek V4 Pro frontier-ish quality at ~half the price
configs/frontier.yaml
Opus 4.8 · GPT-5.5 · Gemini 3.1 Pro beyond-frontier quality

Custom panel:

name: fusion
panel:
  - anthropic/claude-opus-4.8
  - openai/gpt-5.5
  - model: deepseek/deepseek-v4-pro    # long form allows per-model overrides
    temperature: 0.3
    tools: [web_search]
judge: openai/gpt-5.5
synthesizer: anthropic/claude-opus-4.8
temperature: 0.7
max_tokens: 4096
tools_enabled: false

Model slugs follow OpenRouter conventions (vendor/model

). Point at a different backend with FUSION_BASE_URL

(OpenAI, a local vLLM/Ollama server, Groq, Together — anything OpenAI-compatible). API keys come from the environment only (FUSION_API_KEY

, OPENROUTER_API_KEY

, or OPENAI_API_KEY

), never from YAML.

All optional config fields (with defaults):

Key Default What it does
refine
false
Run one extra self-critique pass over the synthesized answer (quality ↑, cost ↑).
layers
1
Multi-layer MoA — with layers>1 , proposers see the previous layer's drafts and improve before the final synthesis.
samples
1
Self-consistency — sample each proposer K times so the judge/synthesizer see more drafts.
diversity
true
Spread panelist temperatures so drafts differ (≈¼ of the lift).
diversity_jitter
0.3
How wide to spread temperatures (the MoA diversity↔quality trade-off — keep it modest).
max_retries
2
Retry transient upstream failures (429/5xx/timeout) so a flaky panelist doesn't shrink the panel.
retry_backoff
0.5
Base seconds for exponential retry backoff.
max_concurrency
0
Cap concurrent panelist calls (0 = unlimited).

If the judge fails, synthesis still runs from the raw responses; if the synthesizer fails, the best panelist's answer is returned. Anything that degraded is reported in the response's fusion.degraded

list — never silently.

Panelists can call tools while drafting — useful for deep-research tasks. Tools are off by default. Enable globally and per-model:

tools_enabled: true
panel:
  - model: deepseek/deepseek-v4-pro
    tools: [web_search, bash]

web_search

— keyless DuckDuckGo Instant Answer by default; swap in a Tavily/Brave/SerpAPI backend viadefault_registry(search_fn=...)

.bash

— runs in a sandboxed shell (timeout, stripped env, output truncation).

⚠️ Security:bash

executes commands the model writes. The sandbox is not a container. Run the server in a disposable VM/container before enablingbash

with untrusted input. It is opt-in because it is dangerous.

Every response carries the real numbers. Non-streaming responses include a fusion

block plus headers:

{
  "choices": [ ... ],
  "usage": { "prompt_tokens": 1234, "completion_tokens": 567, "total_tokens": 1801 },
  "fusion": {
    "config": "fusion",
    "panel_models": ["google/gemini-3-flash", "moonshotai/kimi-k2.6", "deepseek/deepseek-v4-pro"],
    "panel_succeeded": 3,
    "cost_usd": 0.0123,
    "cost_breakdown": [ { "model": "...", "role": "panel", "cost_usd": 0.004 }, ... ],
    "timing_s": { "panel": 2.1, "judge": 0.8, "synth": 3.4, "total": 6.3 }
  }
}

Headers: x-fusion-cost-usd

, x-fusion-latency-s

. When the backend reports an authoritative per-call cost, that value is used instead of the local price table (fusion/pricing.py

).

Reproduce the panel-vs-solo comparison on DRACO-style weighted tasks (negative criteria penalize wrong claims, so you can't bluff a high score):

fusion eval --dry-run

fusion eval --ab --dry-run

fusion eval --config configs/budget.yaml --tasks eval/tasks.sample.yaml
Panel vs solo — scored on 3 task(s)

panel+refine                     100.0%  ████████████████████  ★ (+28.6 vs best solo)
panel                             76.2%  ███████████████  ★ (+4.8 vs best solo)
google/gemini-3-flash (solo)      71.4%  ██████████████

--runs N

repeats each task K times (self-consistency). Add your own tasks in eval/tasks.sample.yaml

(id, prompt, weighted criteria). The dry-run numbers are from deterministic stubs — real lift needs a key; run the live command above.

Method Path Description
POST
/v1/chat/completions
OpenAI-compatible; stream:true supported. Model slug fusion .
GET
/v1/models
Lists fusion plus the configured panel models.
GET
/health
Liveness + active config + panel.

Per-request overrides. Customize the panel with per-request overrides (like OpenRouter Fusion's "pass your own participant models and synthesizer") via a fusion

block in the body. Only safe model-selection/flag keys are honored — the backend URL and keys can never be set from the request:

{
  "model": "fusion",
  "messages": [{"role": "user", "content": "..."}],
  "fusion": {
    "panel": ["anthropic/claude-opus-4.8", "openai/gpt-5.5"],
    "synthesizer": "anthropic/claude-opus-4.8",
    "refine": true,
    "layers": 2
  }
}
fusion/        engine + server + harnesses
               ├─ providers · panel · judge · synthesize · fusion   (MoA engine)
               ├─ tools · server · streaming · schemas · pricing · config
               ├─ tui.py        (fusion chat — TUI harness)
               └─ agent.py · agent_tools.py · cli.py   (fusion code — agent harness)
eval/          DRACO-style evaluation harness (scorer, harness, tasks.sample.yaml)
configs/       panel presets (budget.yaml, frontier.yaml)
integrations/  harness adapters — Pi package, OpenAI-SDK example, adapter guide
scripts/       smoke.sh, verify_install.sh, verify_all.sh
docs/          architecture.md, parity.md
tests/         pytest suite — unit (providers mocked) + real-HTTP e2e (live servers), no API key
pip install -e ".[dev]"
pytest -q          # full suite, no network required

See docs/architecture.md for the design and docs/parity.md for the parity matrix & roadmap.

MIT — see LICENSE.

── more in #ai-agents 4 stories · sorted by recency
── more on @fusionharness 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-fusionharnes…] indexed:0 read:7min 2026-06-17 ·