Surpassing Frontier Performance with Open Source Fusion

wpnews.pro

cd /news/artificial-intelligence/surpassing-frontier-performance-with… · home › topics › artificial-intelligence › article

[ARTICLE · art-33229] src=trustedrouter.com ↗ pub=2026-06-18T20:41Z topic=artificial-intelligence verified=true sentiment=↑ positive

Surpassing Frontier Performance with Open Source Fusion

A committee of five open-weights models fused together scored 69.9 on the DRACO benchmark, beating Anthropic's Fable 5 model which scored 65.3. The open-source panel, including MiniMax M3, Kimi K2.6, DeepSeek V4 Pro, Gemma-4, and GLM-5.2, demonstrates that diverse open models can outperform a single closed model on deep research tasks without relying on proprietary APIs.

read3 min views24 publishedJun 18, 2026

← TrustedRouter blog A committee of five open-weights models you can download, fused into a single answer, beats Anthropic's mid-tier closed model Fable 5 on deep research. We ran an all-open panel — MiniMax M3, Kimi K2.6, DeepSeek V4 Pro, Gemma-4, and GLM-5.2 — each driving its own agentic research loop, then fused the five reports with MiniMax M3 as synthesizer, the best fuser we tested. On DRACO, judged by the same gemini-3.1-pro grader we use everywhere, that stack scores 69.9. Fable 5 running solo scores 65.3. The open committee wins by 4.6 points, and every weight in it sits on disks you control.

The reason this works is that fusion rewards diverse error, and five independently-trained open models disagree in useful ways. Each panelist searches the web, reads its own sources, and writes its own report; where one hallucinates a date or misses a primary source, the others usually don't, and the synthesizer keeps what survives cross-examination. A single model, even a good closed one like Fable 5, has one failure mode and repeats it through the whole answer. The committee has five, mostly uncorrelated, and the judge catches the difference. We have argued before that the strongest open models never show up on the leaderboards that rank them solo, and this is the mechanism: their value shows up in the ensemble, and a single-shot score never measures it.

| Panel | Synthesizer | DRACO |

|---|---|---|
| Frontier-mixed (incl. GPT-5.5, Opus) | MiniMax M3 | 71.6 |
All-open (M3 / K2.6 / V4 Pro / Gemma-4 / GLM-5.2) | MiniMax M3 | 69.9 |
| Fable 5 + GPT-5.5 | GPT-5.5 | 69.0 |

| Fable 5 solo | — | 65.3 |

The obvious objection: does this beat the closed frontier? No. Our frontier-mixed panel, with GPT-5.5 and Opus sitting on the committee, scores 71.6. Letting real frontier models into the panel buys 1.7 points over the open-only version, and that gap is consistent run to run. The open committee also edges past the best closed-ish fusion we have published, a Fable 5 plus GPT-5.5 panel at 69.0. So we make the narrower, fully defensible claim: a committee that touches no proprietary API anywhere clears a single mid-tier closed model by a real margin.

The synthesizer is the seat that decides whether the stack stays open, so we put MiniMax M3 in it: the top fuser we tested, open weights, and it never blanks. The runner-up, GLM-5.2, ties it on score but censors — on politically loaded tasks it refuses or goes blank, Taiwan being the cleanest example, and a synthesizer that won't write the answer is worthless on exactly the questions where fusion matters most. TrustedRouter Fusion covers that case directly: when the synthesizer returns nothing, it falls through to a backup model — here Gemma-4, also open weights — and the run keeps going. With M3 in the seat we never trip the fallback; swap in GLM and you would, and the answer still comes out, with no task reaching for a closed model to get unblocked.

What makes the result matter is what it removes. This configuration has no API key, no per-token bill, no provider that can deprecate the model under you or refuse a prompt class on its own policy. The panelists are open weights, the synthesizer is open weights, the fallback is open weights, and the whole loop runs on hardware you own. For deep research with its long horizon, many sources, and real cost per query, that is the difference between renting a capability and holding one. The pieces, the exact panel, and the DRACO harness are all public: the models we route are listed on our models page, and the fusion code is in TrustedRouter-Fusion-Draco. Five models you can download, judged by the same grader as everything else, land above a closed model people pay for.

source & further reading

trustedrouter.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/surpassing-frontier-perf…

Read original on trustedrouter.com → trustedrouter.com/blog/open-fusion-beats-fable-5

mentioned entities

Anthropic

MiniMax

Moonshot AI

DeepSeek

Google

Zhipu AI

OpenAI

TrustedRouter

metadata

slugsurpassing-frontier-performance-with-open-source-fusion

topic#artificial-intelligence

secondary2 topics

sentimentpositive

canonicaltrustedrouter.com

navigation

← prevGusto Cofounder: An AI agent tha…

next →Hermes Agent Just Released a Des…

── more in #artificial-intelligence 4 stories · sorted by recency

siliconangle.com · 3 Aug · #artificial-intelligence

Report claims China is distilling U.S. frontier models to power military AI applications

cryptobriefing.com · 3 Aug · #artificial-intelligence

Alibaba’s Qwen unveils 2.4 trillion parameter AI model, open weights coming next week

insideai.news · 3 Aug · #artificial-intelligence

AI Model Claude Fable 5 Finds Counterexample to 1939 Math Conjecture

youtube.com · 3 Aug · #artificial-intelligence

China, Open Source and AI Competitiveness with Andrew Ng

── more on @anthropic 3 stories trending now

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required