{"slug": "show-hn-fusionharness-an-open-source-mixture-of-agents-compound-model-server", "title": "Show HN: FusionHarness – An Open-source Mixture-of-Agents compound-model server", "summary": "FusionHarness, an open-source mixture-of-agents compound-model server, was released as a self-hostable alternative to OpenRouter's Fusion API. It fans out prompts to a panel of LLMs in parallel, uses a judge to extract structure, and a synthesizer to produce a final answer, outperforming any single model and rivaling frontier models at lower cost. The server speaks the OpenAI API, enabling integration with existing clients and agent harnesses.", "body_md": "Open-source\n\nMixture-of-Agentscompound-model server — a self-hostable alternative to OpenRouter's Fusion API.\n\nFan a prompt out to a **panel** of LLMs in parallel, let a **judge** extract the structure of their answers (consensus, contradictions, partial coverage, unique insights), then a **synthesizer** writes one final answer grounded in that analysis. The result beats any single panelist — and a panel of *budget* models can rival a frontier model at a fraction of the cost.\n\nIt speaks the **OpenAI API**, so it drops into any existing OpenAI client: point `base_url`\n\nat fusionHarness and use the model slug `fusion`\n\n.\n\n```\n          ┌─────────────┐\nprompt ─► │   fan-out   │ ─► model A ─┐\n          │   (panel)   │ ─► model B ─┤  (parallel, each tool-enabled)\n          └─────────────┘ ─► model C ─┘\n                                  │\n                                  ▼\n                            ┌───────────┐     ┌──────────────┐\n                            │   judge   │ ──► │ synthesizer  │ ─► final answer\n                            │ structure │     │  grounded    │    + cost / latency\n                            └───────────┘     └──────────────┘\n```\n\nWhy it works (OpenRouter's own ablation): ~¾ of the lift comes from\n\nsynthesis, ~¼ fromdiversity.\n\n```\n# 1. Install\npython -m venv .venv && source .venv/bin/activate\npip install -e \".[dev]\"\n\n# 2. Configure — one OpenRouter key reaches every model in the catalog\ncp .env.example .env       # then put your key in FUSION_API_KEY\n\n# 3. Run the OpenAI-compatible server (omit --config to use the built-in budget panel)\nfusion serve --config configs/budget.yaml\n\n# 4. Call it like any OpenAI endpoint\ncurl http://localhost:8000/v1/chat/completions \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\":\"fusion\",\"messages\":[{\"role\":\"user\",\"content\":\"Compare CRDTs vs OT for collaborative editing.\"}]}'\n```\n\nFrom the OpenAI Python SDK (`pip install openai`\n\n):\n\n``` python\nfrom openai import OpenAI\nclient = OpenAI(base_url=\"http://localhost:8000/v1\", api_key=\"unused\")\nresp = client.chat.completions.create(\n    model=\"fusion\",\n    messages=[{\"role\": \"user\", \"content\": \"...\"}],\n)\nprint(resp.choices[0].message.content)\n```\n\nOr straight from the terminal, no server:\n\n```\nexport FUSION_API_KEY=sk-or-...\nfusion ask \"What are the trade-offs between gRPC and REST?\" --config configs/budget.yaml\n```\n\nBecause fusion speaks OpenAI, it drops into any agent harness — or use our own.\n\n```\n# Our own TUI harness — streaming, multi-turn chat (/reset /stats /help /exit)\nfusion chat --config configs/budget.yaml\n\n# Pi (pi.dev): install the package, register the provider, point Pi at it\npi install ./integrations/pi\nbash integrations/pi/install.sh\npi --model fusion\n```\n\nAdapters for Pi, Claude Code, aider, Continue, LangChain, and the OpenAI SDK are\nin [integrations/](/jackulau/fusionHarness/blob/main/integrations/README.md). Verify the whole stack end-to-end with\nno API key:\n\n```\nscripts/smoke.sh --fake     # boots a key-free fake backend + the real server\n```\n\nfusionHarness is also its own **agentic coding harness** — like Claude Code, but\nthe brain can convene the fusion panel. The agent reads, writes, and edits files,\nsearches, and runs bash in a tool-use loop confined to a project root, and can\ncall `council`\n\nto escalate a hard sub-question to the full panel.\n\n```\nfusion code \"add a /version endpoint and a test for it\" --root .\nfusion code                       # interactive agent session\nfusion code \"refactor X\" --plan   # write a plan first, then act\nfusion code \"delete dead code\" --approve   # confirm each file/bash action\n```\n\nEach step is printed as it happens; the agent calls `finish`\n\nwhen the task is\ndone and verified. Tools are confined to `--root`\n\n(default: cwd). For a hard\nsub-problem the agent can call the `council`\n\ntool, which convenes the full fusion\npanel and returns a synthesized answer. `--approve`\n\ngates every mutating tool\n(write/edit/bash); `--plan`\n\nmakes it write a numbered plan before acting.\n\n⚠️ Security:the agent runs bash and edits files. Confinement blocks path escapes, not arbitrary command effects — run it on projects you trust, or in a container.\n\nA config picks the panel, judge, and synthesizer. Two presets ship in `configs/`\n\n:\n\n| Preset | Panel | Use it for |\n|---|---|---|\n`configs/budget.yaml` |\nGemini 3 Flash · Kimi K2.6 · DeepSeek V4 Pro | frontier-ish quality at ~half the price |\n`configs/frontier.yaml` |\nOpus 4.8 · GPT-5.5 · Gemini 3.1 Pro | beyond-frontier quality |\n\nCustom panel:\n\n```\n# my-panel.yaml\nname: fusion\npanel:\n  - anthropic/claude-opus-4.8\n  - openai/gpt-5.5\n  - model: deepseek/deepseek-v4-pro    # long form allows per-model overrides\n    temperature: 0.3\n    tools: [web_search]\njudge: openai/gpt-5.5\nsynthesizer: anthropic/claude-opus-4.8\ntemperature: 0.7\nmax_tokens: 4096\ntools_enabled: false\n```\n\nModel slugs follow OpenRouter conventions (`vendor/model`\n\n). Point at a different\nbackend with `FUSION_BASE_URL`\n\n(OpenAI, a local vLLM/Ollama server, Groq,\nTogether — anything OpenAI-compatible). API keys come from the environment only\n(`FUSION_API_KEY`\n\n, `OPENROUTER_API_KEY`\n\n, or `OPENAI_API_KEY`\n\n), never from YAML.\n\nAll optional config fields (with defaults):\n\n| Key | Default | What it does |\n|---|---|---|\n`refine` |\n`false` |\nRun one extra self-critique pass over the synthesized answer (quality ↑, cost ↑). |\n`layers` |\n`1` |\nMulti-layer MoA — with `layers>1` , proposers see the previous layer's drafts and improve before the final synthesis. |\n`samples` |\n`1` |\nSelf-consistency — sample each proposer K times so the judge/synthesizer see more drafts. |\n`diversity` |\n`true` |\nSpread panelist temperatures so drafts differ (≈¼ of the lift). |\n`diversity_jitter` |\n`0.3` |\nHow wide to spread temperatures (the MoA diversity↔quality trade-off — keep it modest). |\n`max_retries` |\n`2` |\nRetry transient upstream failures (429/5xx/timeout) so a flaky panelist doesn't shrink the panel. |\n`retry_backoff` |\n`0.5` |\nBase seconds for exponential retry backoff. |\n`max_concurrency` |\n`0` |\nCap concurrent panelist calls (0 = unlimited). |\n\nIf the **judge** fails, synthesis still runs from the raw responses; if the\n**synthesizer** fails, the best panelist's answer is returned. Anything that\ndegraded is reported in the response's `fusion.degraded`\n\nlist — never silently.\n\nPanelists can call tools while drafting — useful for deep-research tasks. Tools\nare **off by default**. Enable globally and per-model:\n\n```\ntools_enabled: true\npanel:\n  - model: deepseek/deepseek-v4-pro\n    tools: [web_search, bash]\n```\n\n`web_search`\n\n— keyless DuckDuckGo Instant Answer by default; swap in a Tavily/Brave/SerpAPI backend via`default_registry(search_fn=...)`\n\n.`bash`\n\n— runs in a sandboxed shell (timeout, stripped env, output truncation).\n\n⚠️ Security:`bash`\n\nexecutes commands the model writes. The sandbox is not a container. Run the server in a disposable VM/container before enabling`bash`\n\nwith untrusted input. It is opt-in because it is dangerous.\n\nEvery response carries the real numbers. Non-streaming responses include a\n`fusion`\n\nblock plus headers:\n\n```\n{\n  \"choices\": [ ... ],\n  \"usage\": { \"prompt_tokens\": 1234, \"completion_tokens\": 567, \"total_tokens\": 1801 },\n  \"fusion\": {\n    \"config\": \"fusion\",\n    \"panel_models\": [\"google/gemini-3-flash\", \"moonshotai/kimi-k2.6\", \"deepseek/deepseek-v4-pro\"],\n    \"panel_succeeded\": 3,\n    \"cost_usd\": 0.0123,\n    \"cost_breakdown\": [ { \"model\": \"...\", \"role\": \"panel\", \"cost_usd\": 0.004 }, ... ],\n    \"timing_s\": { \"panel\": 2.1, \"judge\": 0.8, \"synth\": 3.4, \"total\": 6.3 }\n  }\n}\n```\n\nHeaders: `x-fusion-cost-usd`\n\n, `x-fusion-latency-s`\n\n. When the backend reports an\nauthoritative per-call cost, that value is used instead of the local price table\n(`fusion/pricing.py`\n\n).\n\nReproduce the panel-vs-solo comparison on DRACO-style weighted tasks (negative criteria penalize wrong claims, so you can't bluff a high score):\n\n```\n# deterministic stubs — no API key, proves the pipeline\nfusion eval --dry-run\n\n# A/B: solo vs panel vs panel+refine, with deltas vs the best solo\nfusion eval --ab --dry-run\n\n# live: grade with an LLM judge (needs an API key)\nfusion eval --config configs/budget.yaml --tasks eval/tasks.sample.yaml\nPanel vs solo — scored on 3 task(s)\n\npanel+refine                     100.0%  ████████████████████  ★ (+28.6 vs best solo)\npanel                             76.2%  ███████████████  ★ (+4.8 vs best solo)\ngoogle/gemini-3-flash (solo)      71.4%  ██████████████\n```\n\n`--runs N`\n\nrepeats each task K times (self-consistency). Add your own tasks in\n`eval/tasks.sample.yaml`\n\n(id, prompt, weighted criteria). The dry-run numbers are\nfrom deterministic stubs — **real** lift needs a key; run the live command above.\n\n| Method | Path | Description |\n|---|---|---|\n`POST` |\n`/v1/chat/completions` |\nOpenAI-compatible; `stream:true` supported. Model slug `fusion` . |\n`GET` |\n`/v1/models` |\nLists `fusion` plus the configured panel models. |\n`GET` |\n`/health` |\nLiveness + active config + panel. |\n\n**Per-request overrides.** Customize the panel with per-request overrides (like\nOpenRouter Fusion's \"pass your own participant models and synthesizer\") via a\n`fusion`\n\nblock in the body. Only safe model-selection/flag keys are honored — the\nbackend URL and keys can never be set from the request:\n\n```\n{\n  \"model\": \"fusion\",\n  \"messages\": [{\"role\": \"user\", \"content\": \"...\"}],\n  \"fusion\": {\n    \"panel\": [\"anthropic/claude-opus-4.8\", \"openai/gpt-5.5\"],\n    \"synthesizer\": \"anthropic/claude-opus-4.8\",\n    \"refine\": true,\n    \"layers\": 2\n  }\n}\nfusion/        engine + server + harnesses\n               ├─ providers · panel · judge · synthesize · fusion   (MoA engine)\n               ├─ tools · server · streaming · schemas · pricing · config\n               ├─ tui.py        (fusion chat — TUI harness)\n               └─ agent.py · agent_tools.py · cli.py   (fusion code — agent harness)\neval/          DRACO-style evaluation harness (scorer, harness, tasks.sample.yaml)\nconfigs/       panel presets (budget.yaml, frontier.yaml)\nintegrations/  harness adapters — Pi package, OpenAI-SDK example, adapter guide\nscripts/       smoke.sh, verify_install.sh, verify_all.sh\ndocs/          architecture.md, parity.md\ntests/         pytest suite — unit (providers mocked) + real-HTTP e2e (live servers), no API key\npip install -e \".[dev]\"\npytest -q          # full suite, no network required\n```\n\nSee [docs/architecture.md](/jackulau/fusionHarness/blob/main/docs/architecture.md) for the design and\n[docs/parity.md](/jackulau/fusionHarness/blob/main/docs/parity.md) for the parity matrix & roadmap.\n\nMIT — see [LICENSE](/jackulau/fusionHarness/blob/main/LICENSE).", "url": "https://wpnews.pro/news/show-hn-fusionharness-an-open-source-mixture-of-agents-compound-model-server", "canonical_source": "https://github.com/jackulau/fusionHarness", "published_at": "2026-06-17 14:59:07+00:00", "updated_at": "2026-06-17 15:23:22.138808+00:00", "lang": "en", "topics": ["ai-agents", "ai-infrastructure", "ai-tools", "large-language-models"], "entities": ["FusionHarness", "OpenRouter", "OpenAI", "Pi", "Claude Code", "aider", "Continue", "LangChain"], "alternates": {"html": "https://wpnews.pro/news/show-hn-fusionharness-an-open-source-mixture-of-agents-compound-model-server", "markdown": "https://wpnews.pro/news/show-hn-fusionharness-an-open-source-mixture-of-agents-compound-model-server.md", "text": "https://wpnews.pro/news/show-hn-fusionharness-an-open-source-mixture-of-agents-compound-model-server.txt", "jsonld": "https://wpnews.pro/news/show-hn-fusionharness-an-open-source-mixture-of-agents-compound-model-server.jsonld"}}