Show HN: FusionHarness – An Open-source Mixture-of-Agents compound-model server

FusionHarness, an open-source mixture-of-agents compound-model server, was released as a self-hostable alternative to OpenRouter's Fusion API. It fans out prompts to a panel of LLMs in parallel, uses a judge to extract structure, and a synthesizer to produce a final answer, outperforming any single model and rivaling frontier models at lower cost. The server speaks the OpenAI API, enabling integration with existing clients and agent harnesses.

Open-source Mixture-of-Agentscompound-model server — a self-hostable alternative to OpenRouter's Fusion API. Fan a prompt out to a panel of LLMs in parallel, let a judge extract the structure of their answers consensus, contradictions, partial coverage, unique insights , then a synthesizer writes one final answer grounded in that analysis. The result beats any single panelist — and a panel of budget models can rival a frontier model at a fraction of the cost. It speaks the OpenAI API , so it drops into any existing OpenAI client: point base url at fusionHarness and use the model slug fusion . ┌─────────────┐ prompt ─► │ fan-out │ ─► model A ─┐ │ panel │ ─► model B ─┤ parallel, each tool-enabled └─────────────┘ ─► model C ─┘ │ ▼ ┌───────────┐ ┌──────────────┐ │ judge │ ──► │ synthesizer │ ─► final answer │ structure │ │ grounded │ + cost / latency └───────────┘ └──────────────┘ Why it works OpenRouter's own ablation : ~¾ of the lift comes from synthesis, ~¼ fromdiversity. 1. Install python -m venv .venv && source .venv/bin/activate pip install -e ". dev " 2. Configure — one OpenRouter key reaches every model in the catalog cp .env.example .env then put your key in FUSION API KEY 3. Run the OpenAI-compatible server omit --config to use the built-in budget panel fusion serve --config configs/budget.yaml 4. Call it like any OpenAI endpoint curl http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model":"fusion","messages": {"role":"user","content":"Compare CRDTs vs OT for collaborative editing."} }' From the OpenAI Python SDK pip install openai : python from openai import OpenAI client = OpenAI base url="http://localhost:8000/v1", api key="unused" resp = client.chat.completions.create model="fusion", messages= {"role": "user", "content": "..."} , print resp.choices 0 .message.content Or straight from the terminal, no server: export FUSION API KEY=sk-or-... fusion ask "What are the trade-offs between gRPC and REST?" --config configs/budget.yaml Because fusion speaks OpenAI, it drops into any agent harness — or use our own. Our own TUI harness — streaming, multi-turn chat /reset /stats /help /exit fusion chat --config configs/budget.yaml Pi pi.dev : install the package, register the provider, point Pi at it pi install ./integrations/pi bash integrations/pi/install.sh pi --model fusion Adapters for Pi, Claude Code, aider, Continue, LangChain, and the OpenAI SDK are in integrations/ /jackulau/fusionHarness/blob/main/integrations/README.md . Verify the whole stack end-to-end with no API key: scripts/smoke.sh --fake boots a key-free fake backend + the real server fusionHarness is also its own agentic coding harness — like Claude Code, but the brain can convene the fusion panel. The agent reads, writes, and edits files, searches, and runs bash in a tool-use loop confined to a project root, and can call council to escalate a hard sub-question to the full panel. fusion code "add a /version endpoint and a test for it" --root . fusion code interactive agent session fusion code "refactor X" --plan write a plan first, then act fusion code "delete dead code" --approve confirm each file/bash action Each step is printed as it happens; the agent calls finish when the task is done and verified. Tools are confined to --root default: cwd . For a hard sub-problem the agent can call the council tool, which convenes the full fusion panel and returns a synthesized answer. --approve gates every mutating tool write/edit/bash ; --plan makes it write a numbered plan before acting. ⚠️ Security:the agent runs bash and edits files. Confinement blocks path escapes, not arbitrary command effects — run it on projects you trust, or in a container. A config picks the panel, judge, and synthesizer. Two presets ship in configs/ : | Preset | Panel | Use it for | |---|---|---| configs/budget.yaml | Gemini 3 Flash · Kimi K2.6 · DeepSeek V4 Pro | frontier-ish quality at ~half the price | configs/frontier.yaml | Opus 4.8 · GPT-5.5 · Gemini 3.1 Pro | beyond-frontier quality | Custom panel: my-panel.yaml name: fusion panel: - anthropic/claude-opus-4.8 - openai/gpt-5.5 - model: deepseek/deepseek-v4-pro long form allows per-model overrides temperature: 0.3 tools: web search judge: openai/gpt-5.5 synthesizer: anthropic/claude-opus-4.8 temperature: 0.7 max tokens: 4096 tools enabled: false Model slugs follow OpenRouter conventions vendor/model . Point at a different backend with FUSION BASE URL OpenAI, a local vLLM/Ollama server, Groq, Together — anything OpenAI-compatible . API keys come from the environment only FUSION API KEY , OPENROUTER API KEY , or OPENAI API KEY , never from YAML. All optional config fields with defaults : | Key | Default | What it does | |---|---|---| refine | false | Run one extra self-critique pass over the synthesized answer quality ↑, cost ↑ . | layers | 1 | Multi-layer MoA — with layers 1 , proposers see the previous layer's drafts and improve before the final synthesis. | samples | 1 | Self-consistency — sample each proposer K times so the judge/synthesizer see more drafts. | diversity | true | Spread panelist temperatures so drafts differ ≈¼ of the lift . | diversity jitter | 0.3 | How wide to spread temperatures the MoA diversity↔quality trade-off — keep it modest . | max retries | 2 | Retry transient upstream failures 429/5xx/timeout so a flaky panelist doesn't shrink the panel. | retry backoff | 0.5 | Base seconds for exponential retry backoff. | max concurrency | 0 | Cap concurrent panelist calls 0 = unlimited . | If the judge fails, synthesis still runs from the raw responses; if the synthesizer fails, the best panelist's answer is returned. Anything that degraded is reported in the response's fusion.degraded list — never silently. Panelists can call tools while drafting — useful for deep-research tasks. Tools are off by default . Enable globally and per-model: tools enabled: true panel: - model: deepseek/deepseek-v4-pro tools: web search, bash web search — keyless DuckDuckGo Instant Answer by default; swap in a Tavily/Brave/SerpAPI backend via default registry search fn=... . bash — runs in a sandboxed shell timeout, stripped env, output truncation . ⚠️ Security: bash executes commands the model writes. The sandbox is not a container. Run the server in a disposable VM/container before enabling bash with untrusted input. It is opt-in because it is dangerous. Every response carries the real numbers. Non-streaming responses include a fusion block plus headers: { "choices": ... , "usage": { "prompt tokens": 1234, "completion tokens": 567, "total tokens": 1801 }, "fusion": { "config": "fusion", "panel models": "google/gemini-3-flash", "moonshotai/kimi-k2.6", "deepseek/deepseek-v4-pro" , "panel succeeded": 3, "cost usd": 0.0123, "cost breakdown": { "model": "...", "role": "panel", "cost usd": 0.004 }, ... , "timing s": { "panel": 2.1, "judge": 0.8, "synth": 3.4, "total": 6.3 } } } Headers: x-fusion-cost-usd , x-fusion-latency-s . When the backend reports an authoritative per-call cost, that value is used instead of the local price table fusion/pricing.py . Reproduce the panel-vs-solo comparison on DRACO-style weighted tasks negative criteria penalize wrong claims, so you can't bluff a high score : deterministic stubs — no API key, proves the pipeline fusion eval --dry-run A/B: solo vs panel vs panel+refine, with deltas vs the best solo fusion eval --ab --dry-run live: grade with an LLM judge needs an API key fusion eval --config configs/budget.yaml --tasks eval/tasks.sample.yaml Panel vs solo — scored on 3 task s panel+refine 100.0% ████████████████████ ★ +28.6 vs best solo panel 76.2% ███████████████ ★ +4.8 vs best solo google/gemini-3-flash solo 71.4% ██████████████ --runs N repeats each task K times self-consistency . Add your own tasks in eval/tasks.sample.yaml id, prompt, weighted criteria . The dry-run numbers are from deterministic stubs — real lift needs a key; run the live command above. | Method | Path | Description | |---|---|---| POST | /v1/chat/completions | OpenAI-compatible; stream:true supported. Model slug fusion . | GET | /v1/models | Lists fusion plus the configured panel models. | GET | /health | Liveness + active config + panel. | Per-request overrides. Customize the panel with per-request overrides like OpenRouter Fusion's "pass your own participant models and synthesizer" via a fusion block in the body. Only safe model-selection/flag keys are honored — the backend URL and keys can never be set from the request: { "model": "fusion", "messages": {"role": "user", "content": "..."} , "fusion": { "panel": "anthropic/claude-opus-4.8", "openai/gpt-5.5" , "synthesizer": "anthropic/claude-opus-4.8", "refine": true, "layers": 2 } } fusion/ engine + server + harnesses ├─ providers · panel · judge · synthesize · fusion MoA engine ├─ tools · server · streaming · schemas · pricing · config ├─ tui.py fusion chat — TUI harness └─ agent.py · agent tools.py · cli.py fusion code — agent harness eval/ DRACO-style evaluation harness scorer, harness, tasks.sample.yaml configs/ panel presets budget.yaml, frontier.yaml integrations/ harness adapters — Pi package, OpenAI-SDK example, adapter guide scripts/ smoke.sh, verify install.sh, verify all.sh docs/ architecture.md, parity.md tests/ pytest suite — unit providers mocked + real-HTTP e2e live servers , no API key pip install -e ". dev " pytest -q full suite, no network required See docs/architecture.md /jackulau/fusionHarness/blob/main/docs/architecture.md for the design and docs/parity.md /jackulau/fusionHarness/blob/main/docs/parity.md for the parity matrix & roadmap. MIT — see LICENSE /jackulau/fusionHarness/blob/main/LICENSE .