{"slug": "show-hn-openfusion-enhanced-results-from-a-panel-of-models", "title": "Show HN: Openfusion - enhanced results from a panel of models", "summary": "Openfusion, an open-source drop-in compound-model proxy, lets users point any OpenAI-compatible tool at it to fan out prompts to a panel of LLMs in parallel, then a judge model synthesizes a single answer. The project aims to improve answer quality by combining multiple models, offering a tunable alternative to OpenRouter's Fusion. It includes a terminal chat, web playground, and supports presets like 'quality' and 'budget'.", "body_md": "An open-source, drop-in compound-model proxy. Point any OpenAI-compatible tool at it,\nset `model: \"openfusion\"`\n\n, and your prompt is fanned out to a panel of LLMs in parallel —\nthen a judge model reads every response (consensus, contradictions, blind spots) and streams\nback a single synthesized answer that aims to beat any one of them.\n\nIt's the open version of the mixture-of-agents idea behind OpenRouter's Fusion: better answers from models you already pay for, as a tunable, forkable recipe instead of a black box.\n\n** Quick start** ·\n\n[How it works](#how-it-works)·\n\n[Playground](#playground)·\n\n[Routing & strategies](#routing--strategies)·\n\n[vs. OpenRouter Fusion](#openfusion-vs-openrouter-fusion)·\n\n[Benchmarks](#benchmarks)·\n\n[Contributing](/shahar-dagan/openfusion/blob/main/CONTRIBUTING.md)\n\nNew here? You only need the first two to run it; the rest is for tuning and contributing.\n\n| Path | What it is |\n|---|---|\n`openfusion/` |\nThe proxy (FastAPI). Start with `server.py` ; see\n|\n`web/` |\nThe playground UI source (React + shadcn). Built assets ship in `openfusion/static/` . |\n`examples/` |\nCopy-paste config recipes (preset, dev, panel, bench…). You don't need a config to start. |\n`bench/` |\nReproducible head-to-head harness; `bench/FINDINGS.md` is where fusion does and doesn't pay off. |\n`DESIGN.md` · `docs/` |\nDesign rationale, architecture, and security notes. |\n\n**Beta** — panel fan-out, judge synthesis, SSE streaming, web-tool fusion, an Auto Router, debate/\nvote/ranked aggregators, production limits, and an interactive playground. See [DESIGN.md](/shahar-dagan/openfusion/blob/main/DESIGN.md)\nand [docs/ARCHITECTURE.md](/shahar-dagan/openfusion/blob/main/docs/ARCHITECTURE.md) for architecture and security notes.\n\n`openfusion`\n\nhas two front ends — an interactive terminal chat and a web playground. No clone, no\nconfig, no env vars needed to start.\n\n```\nuvx --from git+https://github.com/shahar-dagan/openfusion openfusion   # ephemeral, needs uv\n# …or: pip install git+https://github.com/shahar-dagan/openfusion && openfusion\n```\n\nBare `openfusion`\n\ndrops you into a Rich-rendered chat with the model panel — a banner, a live\npanel-progress spinner, Markdown answers with syntax-highlighted code, and slash commands\n(`/preset`\n\n, `/tokens`\n\n, `/models`\n\n, `/key`\n\n, `/clear`\n\n). On first run it asks for your OpenRouter key and\n**saves it** (`~/.config/openfusion/credentials`\n\n), so later runs don't re-prompt; use `/key`\n\nto\nchange it. Pipe for one-shots: `echo \"…\" | openfusion`\n\n.\n\n```\nopenfusion web                                  # opens the playground in your browser\n# …or: docker run -p 8000:8000 ghcr.io/shahar-dagan/openfusion\n```\n\n`openfusion web`\n\npops the playground open at `http://localhost:8000`\n\nonce the server is ready (pass\n`--no-open`\n\n, or it's skipped automatically in non-interactive/headless/Docker contexts). Paste your\nkey (kept only in server memory) and fuse. With nothing configured it boots the **Budget** preset (a\ndiverse panel + judge with web search) so the first run lands where fusion actually wins.\n\n```\nuv tool install .     # from a clone — or: pipx install . && pipx ensurepath\n```\n\nFor active development, `pip install -e .`\n\ninside an activated venv (the command then works only\nwhile that venv is active). A bare `pip install -e .`\n\ndoes not put `openfusion`\n\non your global PATH —\nsee [Troubleshooting](#troubleshooting).\n\nFor a fixed recipe, write an `openfusion.yaml`\n\n(start from `examples/preset.yaml.example`\n\n—\n`preset: quality | budget`\n\n, or `examples/default.yaml.example`\n\nfor a fully spelled-out panel/judge). A\n**preset** expands to a diverse OpenRouter panel + judge with web tools on, mirroring OpenRouter\nFusion's Quality/Budget switch:\n\n| Preset | Panel | Judge | Tools |\n|---|---|---|---|\n`quality` |\nClaude Sonnet 4 · Gemini 3 Pro · DeepSeek V4 Pro | Claude Sonnet 4 | web search + fetch |\n`budget` |\nGPT-4o-mini · DeepSeek V4 Pro · Kimi K2.6 | DeepSeek V4 Pro | web search + fetch |\n\nUse as a drop-in API from the OpenAI SDK (with `openfusion web`\n\nrunning):\n\n``` python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"http://localhost:8000/v1\", api_key=\"local-dev\")\nstream = client.chat.completions.create(\n    model=\"openfusion\",\n    messages=[{\"role\": \"user\", \"content\": \"Explain mixture-of-agents in one paragraph.\"}],\n    stream=True,\n)\nfor chunk in stream:\n    print(chunk.choices[0].delta.content or \"\", end=\"\")\n```\n\nOr straight from the terminal, no server needed:\n\n```\nopenfusion ask \"Compare Postgres and SQLite for a small SaaS.\" --max-tokens 800\n```\n\n`ask`\n\nruns one fusion against your configured panel and streams the synthesized answer to stdout\n(panel progress goes to stderr). `--max-tokens`\n\ncaps every call — lower is faster and cheaper.\n\nSpeed & length.Fusion runs N panel calls plus a judge, so it's slower than one model — the panel runs in parallel and the judge streams as soon as the panel finishes. The judge is prompted to stay concise, and you cap length with`--max-tokens`\n\n(CLI),`max_tokens`\n\n(API), the response- length control in the playground Settings, or`cost_controls`\n\nin config.\n\nThree knobs control *whether* and *how* a prompt is fused. All are optional and off/default.\n\n-\n**Auto Router**(`router.enabled: true`\n\n) — a per-prompt gate that answers simple prompts with a single pass-through call and reserves the panel for prompts that look like they benefit (long, analytical, or containing code). Default is a cheap heuristic (no extra model call);`mode: model`\n\nuses a small classifier model and falls back to the heuristic if it errors:\n\n```\nrouter:\n  enabled: true\n  mode: heuristic     # heuristic | model | always | never\n  min_chars: 280      # prompts at/over this length fuse\n  # classifier:       # required for mode: model\n  #   base_url: https://openrouter.ai/api/v1\n  #   api_key: ${OPENROUTER_API_KEY}\n  #   model: openai/gpt-4o-mini\n```\n\n-\n**Strategy**(`strategy:`\n\n) — how the panel is produced:`self_fusion`\n\n(one model sampled N times),`panel`\n\n(a fixed diverse panel), or`debate`\n\n(a diverse panel where each member revises after seeing the others' answers, then the judge synthesizes). Debate trades extra cost/latency for cross-examination:\n\n```\nstrategy: debate\ndebate:\n  rounds: 1           # revision rounds before the judge\n```\n\n-\n**Aggregator**(`aggregator:`\n\n) — how answers become one:`judge`\n\n(synthesis, default),`vote`\n\n(majority vote, cheaper, best for verifiable short-answer tasks), or`ranked`\n\n(one short judge call picks the single best answer — cheaper than synthesis, uses model judgment unlike vote). -\n**Analysis transparency**(`analysis.emit: true`\n\n) — surface the judge's structured reasoning (consensus / contradictions / partial coverage / unique insights / blind spots) as a separate SSE`event: analysis`\n\n(and an`analysis`\n\nfield on non-streaming responses), without polluting the answer body. -\n**Prompt caching**(`cache.enabled: true`\n\n) — mark the shared prefix so self-fusion's N samples reuse a cached prompt on providers that support it (a no-op elsewhere).\n\nFor public deployments, bound load and spend (both default to `0`\n\n= unlimited):\n\n```\nlimits:\n  max_in_flight: 64           # cap concurrent requests; over-limit returns 503\n  rate_limit_per_minute: 60   # per gateway key (or per client when unauthenticated); over-limit returns 429\n```\n\nThese are best-effort, single-process guards — pair them with provider-side budgets and, for multi-replica deployments, an edge rate limiter.\n\nA request to `model: \"openfusion\"`\n\nis fanned out to a panel of models in parallel (each optionally\ndoing its own web research), then a judge model reads every answer and synthesizes one — streamed\nback over SSE, with the structured analysis and cost alongside.\n\n``` php\nflowchart LR\n    C[\"Client<br/>(Cursor · OpenAI SDK · anything)\"] -->|\"POST /v1/chat/completions<br/>model=openfusion\"| R{\"Router<br/><i>(optional)</i>\"}\n    R -->|simple prompt| S[\"Single model\"] --> OUT\n    R -->|worth fusing| P\n\n    subgraph P [\"Panel · parallel fan-out\"]\n        direction TB\n        A[\"Model A 🔍\"]\n        B[\"Model B 🔍\"]\n        D[\"Model C 🔍\"]\n    end\n\n    P --> J[\"Judge<br/>consensus · contradictions · blind spots\"]\n    J --> OUT[\"Streamed answer (SSE)<br/>+ analysis + token/cost\"]\n    C -.->|other model / client tools| S\n\n    classDef accent fill:#eef2ff,stroke:#4f46e5,color:#3730a3;\n    class J,R accent;\n```\n\n**Drop-in.** OpenAI-compatible`POST /v1/chat/completions`\n\n+`/v1/models`\n\n, real SSE streaming.**No lock-in.** Each panel member + judge is`{base_url, api_key, model}`\n\n. OpenRouter is the default upstream; OpenAI, Together, local vLLM/Ollama all work.**Config-driven.** Panel, judge, strategy, aggregator, router, and limits live in`openfusion.yaml`\n\n— or a one-word`preset`\n\n, or nothing at all (zero-config quick start).\n\nopenfusion is the open implementation of the same idea. The core mechanism is at parity; the differences are scale and a per-prompt router.\n\n| OpenRouter Fusion | openfusion | |\n|---|---|---|\n| Parallel panel → judge synthesis | ✅ | ✅ |\n| Synthesis dimensions | consensus · contradictions · partial coverage · unique insights · blind spots | same |\n| Web search + fetch on the panel | ✅ (default) | ✅ (on by default with `preset:` ) |\n| Quality / Budget presets | ✅ | ✅ (`preset: quality | budget` ) |\n| Override panel + judge | ✅ (plugin fields) | ✅ (any `{base_url, api_key, model}` in YAML) |\n| Per-call cost breakdown | ✅ (Activity) | ✅ (SSE `usage` event + `/metrics` ) |\n| Self-hostable / forkable | ❌ closed API | ✅ MIT, any OpenAI-compatible provider |\n| Per-prompt Auto Router | ✅ | ✅ heuristic or model classifier (`router.enabled` ) |\n| Structured analysis surfaced | ✅ | ✅ `analysis.emit` (SSE `analysis` event) |\n| Multi-round debate | — | ✅ `strategy: debate` |\n| Concurrency cap + rate limiting | ✅ | ✅ `limits` (best-effort, single-process) |\n| Interactive web playground | ✅ | ✅ embedded at `/playground` (zero-build) |\n| Headline benchmark | full DRACO (100 tasks) | DRACO subset (10 tasks) — see\n|\n\n| Parameter | Applies to | Notes |\n|---|---|---|\n`temperature` (client) |\nJudge only indirectly via recipe | Self-fusion varies panel temps from config, not client |\n`max_tokens` , `stop` , `response_format` |\nJudge (visible output) | Panel members use recipe defaults |\n`stream` , `stream_options` |\nJudge path | Panel always runs non-streamed internally |\n`tools` / `tool_calls` |\nFusion or pass-through | Server-executable web tools (`openrouter:web_search` /`web_fetch` ) are fused; client-side function tools and mid-conversation tool turns pass through |\n\n| Variable | Purpose |\n|---|---|\n`OPENROUTER_API_KEY` |\nDefault upstream key (via `${OPENROUTER_API_KEY}` in config) |\n`OPENFUSION_CONFIG` |\nPath to config file (default: `openfusion.yaml` ) |\n`OPENFUSION_API_KEYS` |\nComma-separated gateway allowlist (optional) |\n`OPENFUSION_HOST` / `OPENFUSION_PORT` |\nServer bind address |\n\n`cost_controls`\n\nin config caps `max_tokens`\n\nfor pass-through, panel, and judge calls. Missing\n`max_tokens`\n\nvalues are filled from the configured ceiling; over-limit pass-through and judge\nrequests return `400`\n\n, while internal panel calls clamp to their ceiling.\n\nRun the opt-in live OpenRouter smoke test only when you intend to spend a small number of credits:\n\n```\nexport OPENROUTER_API_KEY=your-key\npython scripts/openrouter_smoke.py --config examples/dev.yaml.example --yes-spend-credits\n```\n\nRun the head-to-head benchmark (self-fusion vs solo model):\n\n```\npip install -e \".[dev]\"\npython bench/run.py --config examples/default.yaml.example --tasks bench/tasks/sample.jsonl\n```\n\nUse `--tasks bench/tasks/smoke.jsonl --max-tokens 32`\n\nbefore larger benchmark runs.\n\nEach run reports accuracy **plus** the spend it took to get there — `total_tokens`\n\nand\n`total_cost_usd`\n\nper mode — so you can weigh any accuracy change against the extra cost of fanning\nout to a panel.\n\nThe bundled `bench/tasks/sample.jsonl`\n\n(20 short Q&A tasks) is **saturated** for a capable model —\nthe solo baseline already scores ~100%, so there is no headroom for fusion to add accuracy. On a\nrecent run with `openai/gpt-4o-mini`\n\n(self-fusion N=2, `max_tokens=32`\n\n):\n\n| Mode | Accuracy | Avg latency | Tokens | Cost |\n|---|---|---|---|---|\n| Solo | 100% (20/20) | 0.55s | 536 | $0.0001 |\n| Self-fusion | 95% (19/20) | 1.40s | 4,669 | $0.0008 |\n\nSo on easy tasks fusion does **not** beat a single call — it costs more (here ~9× the tokens) and\ncan even regress, because the judge only has trivially-correct answers to choose between. This is\nexpected: mixture-of-agents helps where a single model is *unreliable*, not where it is already\nright.\n\nopenfusion makes\n\nno\"beats frontier\" claim. Demonstrating where fusion earns its cost needs a harder eval (one the solo baseline does not already ace) scored onquality per dollar, not accuracy alone. That eval is in progress; this table will be updated to show where fusion does and doesn't pay off. Claim only what your own`bench/run.py`\n\nrun proves on your model and tasks.\n\nThe proxy exposes Prometheus metrics at `GET /metrics`\n\n(no auth; scrape-only, bind accordingly):\n\n`openfusion_requests_total{route,outcome}`\n\n— client-facing requests (`fusion`\n\n/`pass_through`\n\n).`openfusion_upstream_requests_total{phase,outcome}`\n\n— upstream calls by`panel`\n\n/`judge`\n\n/`pass_through`\n\n.`openfusion_panel_members_total{outcome}`\n\n— per-member success vs. degraded failures.`openfusion_tokens_total{phase,kind}`\n\nand`openfusion_cost_usd_total{phase}`\n\n— token and cost spend.`openfusion_request_latency_ms`\n\n/`openfusion_upstream_latency_ms`\n\n— latency summaries (`_count`\n\n+`_sum`\n\n).\n\nCost (`usage.cost`\n\n, when the upstream reports it) is also rolled into the per-request SSE\n`event: usage`\n\npayload and the non-streaming `usage`\n\nfield, so a single fusion call shows what it\nspent across the panel and judge. Per-call structured logs remain on the `openfusion.upstream`\n\nlogger.\n\nThe server hosts an interactive playground at `GET /playground`\n\n(and `GET /`\n\nredirects there). It's\na React + Tailwind + shadcn UI whose **built assets ship in the package** (no Node needed to run);\nit talks only to the local `/v1`\n\nAPI, so provider keys never reach the browser. You can:\n\n- paste your OpenRouter API key on first run (held only in server memory; enabled by\n`allow_ui_api_key`\n\n, on for the zero-config quick start), - pick a\n**Quality / Budget / Custom** panel and a \"Fuse with\" judge model, - toggle web search, send a prompt, and watch the\n**panel → synthesis** progress, - read the streamed answer plus the judge's\n**structured analysis**(consensus / contradictions / blind spots) and the** token + cost**breakdown.\n\nThe model selectors are editable when the server sets `allow_request_overrides: true`\n\n(on for the\nquick start), which enables the per-request `openfusion: { preset | panel | judge | tools }`\n\nfield\n(mirroring OpenRouter Fusion's `analysis_models`\n\n/`model`\n\nplugin fields). Overrides reuse the\nserver's upstream credentials — clients choose model *ids*, never keys — and stay bounded by gateway\nauth, cost ceilings, and rate limits. Read `GET /v1/config`\n\nfor the active panel/judge and flags.\n\nThe UI source lives in `web/`\n\n(Vite + React + TypeScript + Tailwind v4 + shadcn-style components):\n\n```\ncd web\nnpm install\nnpm run dev      # dev server (proxy /v1 to a running openfusion on :8000)\nnpm run build    # writes built assets into openfusion/static/playground/ (commit them)\n```\n\n** openfusion: command not found** — the console script lives in the environment you installed it\ninto. Either install it as a tool so it's always on\n\n`PATH`\n\n(`uv tool install .`\n\nor `pipx install .`\n\n),\nor activate the venv you used (`source .venv/bin/activate`\n\n). A bare `pip install -e .`\n\ndoes not put\n`openfusion`\n\non your global `PATH`\n\n.**Playground says \"Couldn't reach the server\"** — open the page at the URL the running server prints\n(default `http://localhost:8000`\n\n), not a dev-server port or a standalone file.\n\n** No upstream API key** — set\n\n`OPENROUTER_API_KEY`\n\n, run `openfusion setup`\n\n, or paste your key into\nthe playground.Backend: Python 3.11+ / FastAPI / httpx / uvicorn. Frontend: React / Vite / Tailwind / shadcn.\n\nContributions are welcome — openfusion is meant to be forked and tuned. See\n[CONTRIBUTING.md](/shahar-dagan/openfusion/blob/main/CONTRIBUTING.md) for dev setup and the PR checklist, and\n[CODE_OF_CONDUCT.md](/shahar-dagan/openfusion/blob/main/CODE_OF_CONDUCT.md). Please report security issues privately\nper [SECURITY.md](/shahar-dagan/openfusion/blob/main/SECURITY.md) rather than as a public issue.\n\nMIT.", "url": "https://wpnews.pro/news/show-hn-openfusion-enhanced-results-from-a-panel-of-models", "canonical_source": "https://github.com/shahar-dagan/openfusion", "published_at": "2026-06-18 09:00:35+00:00", "updated_at": "2026-06-18 09:23:37.064970+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "ai-infrastructure", "generative-ai", "ai-agents"], "entities": ["Openfusion", "OpenRouter", "OpenAI", "Claude Sonnet 4", "Gemini 3 Pro", "DeepSeek V4 Pro", "GPT-4o-mini", "Kimi K2.6"], "alternates": {"html": "https://wpnews.pro/news/show-hn-openfusion-enhanced-results-from-a-panel-of-models", "markdown": "https://wpnews.pro/news/show-hn-openfusion-enhanced-results-from-a-panel-of-models.md", "text": "https://wpnews.pro/news/show-hn-openfusion-enhanced-results-from-a-panel-of-models.txt", "jsonld": "https://wpnews.pro/news/show-hn-openfusion-enhanced-results-from-a-panel-of-models.jsonld"}}