# Sakana AI's Fugu orchestrates multiple LLMs to match Anthropic's Fable and Mythos benchmarks

> Source: <https://the-decoder.com/sakana-ais-fugu-orchestrates-multiple-llms-to-match-anthropics-fable-and-mythos-benchmarks/>
> Published: 2026-06-22 08:18:59+00:00

# Sakana AI's Fugu orchestrates multiple LLMs to match Anthropic's Fable and Mythos benchmarks

## Key Points

- Japanese AI startup Sakana AI is launching Fugu, a system that dynamically coordinates multiple language models from a swappable pool while behaving like a single model through one API.
- Sakana says Fugu outperforms Anthropic's best models, Fable and Mythos, in benchmarks, even though neither model is part of its LLM pool.
- Fugu comes in a base version for everyday tasks and a more powerful Fugu Ultra variant. The swappable pool design also aims to reduce dependence on any single AI provider.

**Tokyo-based AI startup Sakana AI is launching Fugu, a system that dynamically coordinates multiple AI models to compete with leading systems like Anthropic's Fable 5. The approach also aims to reduce dependence on any single AI provider.**

Tokyo-based startup [Sakana AI](https://sakana.ai/fugu-release/) has unveiled Fugu, a multi-LLM orchestrator that looks and feels like a single model to the user. Sakana already had strong results with orchestrator setups for coding. Its [ALE-Agent](https://the-decoder.de/sakana-ais-ki-agent-schafft-es-unter-die-besten-21-von-1000-code-experten/) placed 21st out of 1,000 human experts in a coding competition.

Fugu is itself a language model, trained to call other LLMs from an agent pool, including copies of itself. Depending on the request, it either handles a task on its own or pulls together a team of specialized models. Selection, delegation, checks, and synthesis all run internally. Users access everything through a single OpenAI-compatible API.

## Fugu Ultra aims to match top-tier models

Sakana AI is launching two variants. The base Fugu model targets low latency and solid everyday performance across coding, code review, and chatbot use cases. Teams with privacy or compliance needs can exclude specific agents from the pool.

Fugu Ultra is built for maximum answer quality on complex, multi-step problems. Early users have put it to work on AI research, reproducing scientific papers, cybersecurity analysis, and patent and literature searches.

According to [benchmark results Sakana AI published](https://github.com/SakanaAI/fugu/blob/main/Fugu_technical_report.pdf), Fugu Ultra performs on par with Anthropic's Fable 5 and Mythos Preview across a range of coding, reasoning, science, and agent benchmarks.

Neither Anthropic model is in Fugu's agent pool, though, since they aren't publicly available. With those models included, Fugu would likely score even higher. Sakana AI says the baseline comparison numbers come from the model providers themselves. The table below shows how Fugu stacks up against the underlying base models.

| Benchmark | Fugu | Fugu Ultra | Opus 4.8 | Gemini 3.1 Pro | GPT 5.5 |
|---|---|---|---|---|---|
| SWE Bench Pro | 59.0 | 73.7 | 69.2 | 54.2 | 58.6 |
| TerminalBench 2.1 | 80.2 | 82.1 | 74.6 | 70.3 | 78.2 |
| LiveCodeBench | 92.9 | 93.2 | 87.8 | 88.5 | 85.3 |
| LiveCodeBench Pro | 87.8 | 90.8 | 84.8 | 82.9 | 88.4 |
| Humanity's Last Exam | 47.2 | 50.0 | 49.8 | 44.4 | 41.4 |
| CharXiv Reasoning | 85.1 | 86.6 | 84.2 | 83.3 | 84.1 |
| GPQA-D | 95.5 | 95.5 | 92.0 | 94.3 | 93.6 |
| SciCode | 60.1 | 58.7 | 53.5 | 58.9 | 56.1 |
| τ³ Banking | 21.7 | 20.6 | 20.6 | 8.4 | 20.6 |
| Long-Context Reasoning | 74.7 | 73.3 | 67.7 | 72.7 | 74.3 |
| MRCRv2 | 86.6 | 93.6 | 87.9 | 84.9 | 94.8 |

## Orchestration as a hedge against vendor lock-in

Sakana AI is pitching Fugu as a safeguard against single-provider dependence. The company points to the recent [export controls on Anthropic's Fable and Mythos models](https://the-decoder.com/amazon-and-five-other-companies-reportedly-triggered-the-government-crackdown-on-anthropics-fable-model/) as a concrete example. Access to top AI systems can vanish overnight due to regulatory shifts or [foreign policy decisions](https://the-decoder.com/us-government-forces-anthropic-to-disable-claude-fable-5-and-mythos-5-for-all-customers-worldwide/).

"For an organization or a nation, relying on a single company’s APIs for critical infrastructure, finance, or governance is a material vulnerability. This risk is no longer a hypothetical possibility, but a reality," Sakana AI writes in its [announcement](https://sakana.ai/fugu-release/). Fugu's model pool is fully swappable, so the system can reroute to other models if one provider goes dark.

The system's real-world performance depends entirely on which models are in the pool, though. If several top providers restrict access at the same time, Fugu's options shrink too. An orchestrator like Fugu may boost resilience, but it's not the same as true sovereignty. Still, Fugu could be worth watching on performance alone.

## Early testers report gains on complex workflows

About 500 beta users have already tested the system in real-world settings, according to Sakana AI. Fugu proved strongest on long, multi-step workflows like automated data research, security analysis, and code reviews.

One software developer says Fugu Ultra catches far more bugs during code review than GPT-5.5. "Where other tools flag about three issues, Fugu surfaced more than twenty." Sakana AI also claims Fugu beat Gemini 3.1 Pro, Opus 4.8, and GPT 5.5 in its own tests on automated research, mechanical design, and financial forecasting.

*Video: According to Sakana, Fugu solves and visualizes a Rubik's Cube faster than the individual models.*

"The beta made clear that multi-agent orchestration matters most when the task is messy, long-running, and difficult to solve with a single model call," writes Sakana AI.

Both variants are live now through a single API on the [product page](https://sakana.ai/fugu/) and [console](https://console.sakana.ai/). Sakana offers subscription plans for daily use and usage-based billing for bigger workloads.

## Sakana's bet is an AI ecosystem rather than a single model

Fugu's technical approach builds on Sakana AI's own research into learned model orchestration, specifically two papers presented at ICLR 2026 called [Trinity](https://sakana.ai/trinity/) and [Conductor](https://sakana.ai/learning-to-orchestrate/).

The idea fits Sakana AI's broader vision of [applying natural principles like swarm behavior, evolution, and collective intelligence to AI systems](https://the-decoder.com/sakana-ai-bets-ai-that-improves-itself-can-break-the-compute-arms-race-of-frontier-labs/). The company sees powerful AI not as a single-model problem but as a collaborative ecosystem that goes beyond what any one model can do alone.

Sakana AI was founded by former [Google AI researchers Llion Jones and David Ha](https://the-decoder.com/microsoft-and-google-dropouts-launch-ai-offerings-in-japan/). Jones co-authored the 2017 "Attention Is All You Need" paper that gave us the Transformer.

```
AI News Without the Hype – Curated by Humans

					Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.				

					Subscribe now
```