cd /news/large-language-models/show-hn-openfusion-enhanced-results-… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-32338] src=github.com β†— pub= topic=large-language-models verified=true sentiment=↑ positive

Show HN: Openfusion - enhanced results from a panel of models

Openfusion, an open-source drop-in compound-model proxy, lets users point any OpenAI-compatible tool at it to fan out prompts to a panel of LLMs in parallel, then a judge model synthesizes a single answer. The project aims to improve answer quality by combining multiple models, offering a tunable alternative to OpenRouter's Fusion. It includes a terminal chat, web playground, and supports presets like 'quality' and 'budget'.

read12 min views1 publishedJun 18, 2026

An open-source, drop-in compound-model proxy. Point any OpenAI-compatible tool at it, set model: "openfusion"

, and your prompt is fanned out to a panel of LLMs in parallel β€” then a judge model reads every response (consensus, contradictions, blind spots) and streams back a single synthesized answer that aims to beat any one of them.

It's the open version of the mixture-of-agents idea behind OpenRouter's Fusion: better answers from models you already pay for, as a tunable, forkable recipe instead of a black box.

** Quick start** Β·

How it worksΒ·

PlaygroundΒ·

Routing & strategiesΒ·

vs. OpenRouter FusionΒ·

BenchmarksΒ·

Contributing

New here? You only need the first two to run it; the rest is for tuning and contributing.

Path What it is
openfusion/
The proxy (FastAPI). Start with server.py ; see
web/
The playground UI source (React + shadcn). Built assets ship in openfusion/static/ .
examples/
Copy-paste config recipes (preset, dev, panel, bench…). You don't need a config to start.
bench/
Reproducible head-to-head harness; bench/FINDINGS.md is where fusion does and doesn't pay off.
DESIGN.md Β· docs/
Design rationale, architecture, and security notes.

Beta β€” panel fan-out, judge synthesis, SSE streaming, web-tool fusion, an Auto Router, debate/ vote/ranked aggregators, production limits, and an interactive playground. See DESIGN.md and docs/ARCHITECTURE.md for architecture and security notes.

openfusion

has two front ends β€” an interactive terminal chat and a web playground. No clone, no config, no env vars needed to start.

uvx --from git+https://github.com/shahar-dagan/openfusion openfusion   # ephemeral, needs uv

Bare openfusion

drops you into a Rich-rendered chat with the model panel β€” a banner, a live panel-progress spinner, Markdown answers with syntax-highlighted code, and slash commands (/preset

, /tokens

, /models

, /key

, /clear

). On first run it asks for your OpenRouter key and saves it (~/.config/openfusion/credentials

), so later runs don't re-prompt; use /key

to change it. Pipe for one-shots: echo "…" | openfusion

.

openfusion web                                  # opens the playground in your browser

openfusion web

pops the playground open at http://localhost:8000

once the server is ready (pass --no-open

, or it's skipped automatically in non-interactive/headless/Docker contexts). Paste your key (kept only in server memory) and fuse. With nothing configured it boots the Budget preset (a diverse panel + judge with web search) so the first run lands where fusion actually wins.

uv tool install .     # from a clone β€” or: pipx install . && pipx ensurepath

For active development, pip install -e .

inside an activated venv (the command then works only while that venv is active). A bare pip install -e .

does not put openfusion

on your global PATH β€” see Troubleshooting.

For a fixed recipe, write an openfusion.yaml

(start from examples/preset.yaml.example

β€” preset: quality | budget

, or examples/default.yaml.example

for a fully spelled-out panel/judge). A preset expands to a diverse OpenRouter panel + judge with web tools on, mirroring OpenRouter Fusion's Quality/Budget switch:

Preset Panel Judge Tools
quality
Claude Sonnet 4 Β· Gemini 3 Pro Β· DeepSeek V4 Pro Claude Sonnet 4 web search + fetch
budget
GPT-4o-mini Β· DeepSeek V4 Pro Β· Kimi K2.6 DeepSeek V4 Pro web search + fetch

Use as a drop-in API from the OpenAI SDK (with openfusion web

running):

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="local-dev")
stream = client.chat.completions.create(
    model="openfusion",
    messages=[{"role": "user", "content": "Explain mixture-of-agents in one paragraph."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Or straight from the terminal, no server needed:

openfusion ask "Compare Postgres and SQLite for a small SaaS." --max-tokens 800

ask

runs one fusion against your configured panel and streams the synthesized answer to stdout (panel progress goes to stderr). --max-tokens

caps every call β€” lower is faster and cheaper.

Speed & length.Fusion runs N panel calls plus a judge, so it's slower than one model β€” the panel runs in parallel and the judge streams as soon as the panel finishes. The judge is prompted to stay concise, and you cap length with--max-tokens

(CLI),max_tokens

(API), the response- length control in the playground Settings, orcost_controls

in config.

Three knobs control whether and how a prompt is fused. All are optional and off/default.

Auto Router(router.enabled: true

) β€” a per-prompt gate that answers simple prompts with a single pass-through call and reserves the panel for prompts that look like they benefit (long, analytical, or containing code). Default is a cheap heuristic (no extra model call);mode: model

uses a small classifier model and falls back to the heuristic if it errors:

router:
  enabled: true
  mode: heuristic     # heuristic | model | always | never
  min_chars: 280      # prompts at/over this length fuse

Strategy(strategy:

) β€” how the panel is produced:self_fusion

(one model sampled N times),panel

(a fixed diverse panel), ordebate

(a diverse panel where each member revises after seeing the others' answers, then the judge synthesizes). Debate trades extra cost/latency for cross-examination:

strategy: debate
debate:
  rounds: 1           # revision rounds before the judge

Aggregator(aggregator:

) β€” how answers become one:judge

(synthesis, default),vote

(majority vote, cheaper, best for verifiable short-answer tasks), orranked

(one short judge call picks the single best answer β€” cheaper than synthesis, uses model judgment unlike vote). - Analysis transparency(analysis.emit: true

) β€” surface the judge's structured reasoning (consensus / contradictions / partial coverage / unique insights / blind spots) as a separate SSEevent: analysis

(and ananalysis

field on non-streaming responses), without polluting the answer body. - Prompt caching(cache.enabled: true

) β€” mark the shared prefix so self-fusion's N samples reuse a cached prompt on providers that support it (a no-op elsewhere).

For public deployments, bound load and spend (both default to 0

= unlimited):

limits:
  max_in_flight: 64           # cap concurrent requests; over-limit returns 503
  rate_limit_per_minute: 60   # per gateway key (or per client when unauthenticated); over-limit returns 429

These are best-effort, single-process guards β€” pair them with provider-side budgets and, for multi-replica deployments, an edge rate limiter.

A request to model: "openfusion"

is fanned out to a panel of models in parallel (each optionally doing its own web research), then a judge model reads every answer and synthesizes one β€” streamed back over SSE, with the structured analysis and cost alongside.

flowchart LR
    C["Client<br/>(Cursor Β· OpenAI SDK Β· anything)"] -->|"POST /v1/chat/completions<br/>model=openfusion"| R{"Router<br/><i>(optional)</i>"}
    R -->|simple prompt| S["Single model"] --> OUT
    R -->|worth fusing| P

    subgraph P ["Panel Β· parallel fan-out"]
        direction TB
        A["Model A πŸ”"]
        B["Model B πŸ”"]
        D["Model C πŸ”"]
    end

    P --> J["Judge<br/>consensus Β· contradictions Β· blind spots"]
    J --> OUT["Streamed answer (SSE)<br/>+ analysis + token/cost"]
    C -.->|other model / client tools| S

    classDef accent fill:#eef2ff,stroke:#4f46e5,color:#3730a3;
    class J,R accent;

Drop-in. OpenAI-compatiblePOST /v1/chat/completions

+/v1/models

, real SSE streaming.No lock-in. Each panel member + judge is{base_url, api_key, model}

. OpenRouter is the default upstream; OpenAI, Together, local vLLM/Ollama all work.Config-driven. Panel, judge, strategy, aggregator, router, and limits live inopenfusion.yaml

β€” or a one-wordpreset

, or nothing at all (zero-config quick start).

openfusion is the open implementation of the same idea. The core mechanism is at parity; the differences are scale and a per-prompt router.

OpenRouter Fusion openfusion
Parallel panel β†’ judge synthesis βœ… βœ…
Synthesis dimensions consensus Β· contradictions Β· partial coverage Β· unique insights Β· blind spots same
Web search + fetch on the panel βœ… (default) βœ… (on by default with preset: )
Quality / Budget presets βœ… βœ… (`preset: quality
Override panel + judge βœ… (plugin fields) βœ… (any {base_url, api_key, model} in YAML)
Per-call cost breakdown βœ… (Activity) βœ… (SSE usage event + /metrics )
Self-hostable / forkable ❌ closed API βœ… MIT, any OpenAI-compatible provider
Per-prompt Auto Router βœ… βœ… heuristic or model classifier (router.enabled )
Structured analysis surfaced βœ… βœ… analysis.emit (SSE analysis event)
Multi-round debate β€” βœ… strategy: debate
Concurrency cap + rate limiting βœ… βœ… limits (best-effort, single-process)
Interactive web playground βœ… βœ… embedded at /playground (zero-build)
Headline benchmark full DRACO (100 tasks) DRACO subset (10 tasks) β€” see
Parameter Applies to Notes
temperature (client)
Judge only indirectly via recipe Self-fusion varies panel temps from config, not client
max_tokens , stop , response_format
Judge (visible output) Panel members use recipe defaults
stream , stream_options
Judge path Panel always runs non-streamed internally
tools / tool_calls
Fusion or pass-through Server-executable web tools (openrouter:web_search /web_fetch ) are fused; client-side function tools and mid-conversation tool turns pass through
Variable Purpose
OPENROUTER_API_KEY
Default upstream key (via ${OPENROUTER_API_KEY} in config)
OPENFUSION_CONFIG
Path to config file (default: openfusion.yaml )
OPENFUSION_API_KEYS
Comma-separated gateway allowlist (optional)
OPENFUSION_HOST / OPENFUSION_PORT
Server bind address

cost_controls

in config caps max_tokens

for pass-through, panel, and judge calls. Missing max_tokens

values are filled from the configured ceiling; over-limit pass-through and judge requests return 400

, while internal panel calls clamp to their ceiling.

Run the opt-in live OpenRouter smoke test only when you intend to spend a small number of credits:

export OPENROUTER_API_KEY=your-key
python scripts/openrouter_smoke.py --config examples/dev.yaml.example --yes-spend-credits

Run the head-to-head benchmark (self-fusion vs solo model):

pip install -e ".[dev]"
python bench/run.py --config examples/default.yaml.example --tasks bench/tasks/sample.jsonl

Use --tasks bench/tasks/smoke.jsonl --max-tokens 32

before larger benchmark runs.

Each run reports accuracy plus the spend it took to get there β€” total_tokens

and total_cost_usd

per mode β€” so you can weigh any accuracy change against the extra cost of fanning out to a panel.

The bundled bench/tasks/sample.jsonl

(20 short Q&A tasks) is saturated for a capable model β€” the solo baseline already scores ~100%, so there is no headroom for fusion to add accuracy. On a recent run with openai/gpt-4o-mini

(self-fusion N=2, max_tokens=32

):

Mode Accuracy Avg latency Tokens Cost
Solo 100% (20/20) 0.55s 536 $0.0001
Self-fusion 95% (19/20) 1.40s 4,669 $0.0008

So on easy tasks fusion does not beat a single call β€” it costs more (here ~9Γ— the tokens) and can even regress, because the judge only has trivially-correct answers to choose between. This is expected: mixture-of-agents helps where a single model is unreliable, not where it is already right.

openfusion makes

no"beats frontier" claim. Demonstrating where fusion earns its cost needs a harder eval (one the solo baseline does not already ace) scored onquality per dollar, not accuracy alone. That eval is in progress; this table will be updated to show where fusion does and doesn't pay off. Claim only what your ownbench/run.py

run proves on your model and tasks.

The proxy exposes Prometheus metrics at GET /metrics

(no auth; scrape-only, bind accordingly):

openfusion_requests_total{route,outcome}

β€” client-facing requests (fusion

/pass_through

).openfusion_upstream_requests_total{phase,outcome}

β€” upstream calls bypanel

/judge

/pass_through

.openfusion_panel_members_total{outcome}

β€” per-member success vs. degraded failures.openfusion_tokens_total{phase,kind}

andopenfusion_cost_usd_total{phase}

β€” token and cost spend.openfusion_request_latency_ms

/openfusion_upstream_latency_ms

β€” latency summaries (_count

+_sum

).

Cost (usage.cost

, when the upstream reports it) is also rolled into the per-request SSE event: usage

payload and the non-streaming usage

field, so a single fusion call shows what it spent across the panel and judge. Per-call structured logs remain on the openfusion.upstream

logger.

The server hosts an interactive playground at GET /playground

(and GET /

redirects there). It's a React + Tailwind + shadcn UI whose built assets ship in the package (no Node needed to run); it talks only to the local /v1

API, so provider keys never reach the browser. You can:

  • paste your OpenRouter API key on first run (held only in server memory; enabled by allow_ui_api_key

, on for the zero-config quick start), - pick a Quality / Budget / Custom panel and a "Fuse with" judge model, - toggle web search, send a prompt, and watch the panel β†’ synthesis progress, - read the streamed answer plus the judge's structured analysis(consensus / contradictions / blind spots) and the** token + cost**breakdown.

The model selectors are editable when the server sets allow_request_overrides: true

(on for the quick start), which enables the per-request openfusion: { preset | panel | judge | tools }

field (mirroring OpenRouter Fusion's analysis_models

/model

plugin fields). Overrides reuse the server's upstream credentials β€” clients choose model ids, never keys β€” and stay bounded by gateway auth, cost ceilings, and rate limits. Read GET /v1/config

for the active panel/judge and flags.

The UI source lives in web/

(Vite + React + TypeScript + Tailwind v4 + shadcn-style components):

cd web
npm install
npm run dev      # dev server (proxy /v1 to a running openfusion on :8000)
npm run build    # writes built assets into openfusion/static/playground/ (commit them)

** openfusion: command not found** β€” the console script lives in the environment you installed it into. Either install it as a tool so it's always on

PATH

(uv tool install .

or pipx install .

), or activate the venv you used (source .venv/bin/activate

). A bare pip install -e .

does not put openfusion

on your global PATH

.Playground says "Couldn't reach the server" β€” open the page at the URL the running server prints (default http://localhost:8000

), not a dev-server port or a standalone file.

** No upstream API key** β€” set

OPENROUTER_API_KEY

, run openfusion setup

, or paste your key into the playground.Backend: Python 3.11+ / FastAPI / httpx / uvicorn. Frontend: React / Vite / Tailwind / shadcn.

Contributions are welcome β€” openfusion is meant to be forked and tuned. See CONTRIBUTING.md for dev setup and the PR checklist, and CODE_OF_CONDUCT.md. Please report security issues privately per SECURITY.md rather than as a public issue.

MIT.

── more in #large-language-models 4 stories Β· sorted by recency
── more on @openfusion 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/show-hn-openfusion-e…] indexed:0 read:12min 2026-06-18 Β· β€”