Fusion Harness: How to combine a more expensive main model and a sidekick model

A developer built a fusion-style delegation harness using the OpenHands SDK that combines a high-capability main LLM agent with a cheaper sidekick agent. The harness allows the main agent to issue multiple parallel task-tool calls to the sidekick, which can escalate back to the main agent when work exceeds its budget. This approach aims to reduce costs by delegating read-only investigation and small tasks to the sidekick while reserving complex decisions for the main model.

| """Fusion-style delegation harness built with the OpenHands SDK. | | | Install: | | | uv pip install openhands-sdk openhands-tools | | | Run: | | | export LLM API KEY="..." or export OPENHANDS API KEY="..." | | | export MAIN MODEL="openhands/gpt-5.5" | | | export SIDEKICK MODEL="openhands/minimax-m2.7" | | | uv run python fusion harness example.py "Find and fix the failing tests in this repo." | | | What this demonstrates: | | | - the main agent keeps one high-capability LLM profile for the whole task | | | - the cheap sidekick is registered as a sub-agent with its own LLM profile | | | - the main agent can issue several task-tool calls in one response | | | - tool concurrency limit lets those sidekick calls run concurrently | | | - sidekick returns ESCALATE TO MAIN when work exceeds its budget | | | """ | | | from future import annotations | | | import os | | | import sys | | | import tempfile | | | from pathlib import Path | | | from pydantic import SecretStr | | | from openhands.sdk import Agent, AgentContext, Conversation, LLM, LLMProfileStore, Tool | | | from openhands.sdk.context import Skill | | | from openhands.sdk.subagent import register agent | | | from openhands.sdk.subagent.schema import AgentDefinition | | | from openhands.tools.delegate import DelegationVisualizer | | | from openhands.tools.file editor import FileEditorTool | | | from openhands.tools.task import TaskToolSet | | | from openhands.tools.terminal import TerminalTool | | | DEFAULT MAIN MODEL = "openhands/gpt-5.5" | | | DEFAULT SIDEKICK MODEL = "openhands/minimax-m2.7" | | | SIDEKICK SKILL = """ | | | You are a fast, low-cost sidekick agent. Your job is to help the main agent, | | | not to complete the whole user request on your own. | | | Rules: | | | 1. Prefer read-only investigation: inspect files, locate relevant code, propose | | | small plans, and draft patch sketches. Do not edit files unless the prompt | | | explicitly asks you to. | | | 2. Keep output compact and structured. Use at most three tool calls unless the | | | prompt explicitly allows more. Prefer broad signals over exhaustive reading. | | | 3. If the task requires broad architecture decisions, many-file edits, unknown | | | product judgement, security-sensitive changes, or you are not confident, | | | stop and return exactly: | | | ESCALATE TO MAIN: <short reason | | | 4. Otherwise return: | | | FINDINGS: | | | - ... | | | PROPOSED NEXT STEP: | | | - ... | | | RISK: | | | - low|medium|high, with one sentence why | | | """.strip | | | ORCHESTRATOR SUFFIX = """ | | | You are the main high-capability agent in a fusion-style harness. | | | Critical parallel-delegation protocol: | | | - If the user lists multiple independent areas, your initial delegation step MUST | | | be a single assistant response containing one task tool call per area, all | | | with subagent type='sidekick' . | | | - Do not delegate only the first area and wait. Do not say you will launch | | | several sidekicks and then call only one. Actually emit all sidekick task calls | | | in the same tool-call batch so tool concurrency limit can run them in | | | parallel. | | | - Before the initial delegation batch, avoid direct terminal/file-editor work | | | unless the user did not provide enough information to form sidekick prompts. | | | - After all sidekick observations return, review them yourself, deduplicate, and | | | make final prioritization with the main model. | | | Good sidekick tasks: | | | - locate relevant files | | | - inspect test failures or logs | | | - summarize a narrow subsystem | | | - draft a small patch plan | | | - check docs or dependency files | | | Do not delegate broad design, final decisions, risky edits, or cross-cutting | | | implementation. If any sidekick returns ESCALATE TO MAIN , stop delegating that | | | thread and handle it yourself with the main model. | | | Always review sidekick output before acting. Treat sidekick output as advisory, | | | not authoritative. The final answer and any code changes are your responsibility. | | | """.strip | | | def require api key - str: | | | api key = os.getenv "LLM API KEY" or os.getenv "OPENHANDS API KEY" | | | if not api key: | | | raise RuntimeError | | | "Set LLM API KEY, or OPENHANDS API KEY when using OpenHands-hosted models." | | | | | | return api key | | | def save profile | | | store: LLMProfileStore, | | | name: str, | | | usage id: str, | | | model: str, | | | base url: str | None, | | | - None: | | | store.save | | | name, | | | LLM | | | usage id=usage id, | | | model=model, | | | base url=base url, | | | , | | | include secrets=False, | | | | | | def load profile store: LLMProfileStore, name: str, api key: str - LLM: | | | return store.load name .model copy update={"api key": SecretStr api key } | | | def build profiles api key: str - tuple LLMProfileStore, LLM : | | | base url = os.getenv "LLM BASE URL" | | | main model = os.getenv "MAIN MODEL", DEFAULT MAIN MODEL | | | sidekick model = os.getenv "SIDEKICK MODEL", DEFAULT SIDEKICK MODEL | | | profile dir = Path tempfile.mkdtemp prefix="openhands-fusion-profiles-" | | | store = LLMProfileStore base dir=str profile dir | | | save profile | | | store, | | | name="fusion-main", | | | usage id="main", | | | model=main model, | | | base url=base url, | | | | | | save profile | | | store, | | | name="fusion-sidekick", | | | usage id="sidekick", | | | model=sidekick model, | | | base url=base url, | | | | | | return store, load profile store, "fusion-main", api key | | | def register sidekick store: LLMProfileStore, api key: str - None: | | | def create sidekick : LLM - Agent: | | | return Agent | | | llm=load profile store, "fusion-sidekick", api key , | | | tools= | | | Tool name=TerminalTool.name , | | | Tool name=FileEditorTool.name , | | | , | | | tool concurrency limit=3, | | | agent context=AgentContext | | | skills= | | | Skill | | | name="fusion sidekick protocol", | | | content=SIDEKICK SKILL, | | | trigger=None, | | | | | | , | | | system message suffix="Stay within the sidekick protocol.", | | | , | | | | | | register agent | | | name="sidekick", | | | factory func=create sidekick, | | | description=AgentDefinition | | | name="sidekick", | | | description= | | | "Fast low-cost sub-agent for bounded investigation, patch " | | | "sketches, and escalation signals." | | | , | | | max iteration per run=int os.getenv "SIDEKICK MAX ITERATIONS", "6" , | | | max budget per run=float os.getenv "SIDEKICK MAX BUDGET", "0.10" , | | | , | | | | | | def build main agent main llm: LLM - Agent: | | | return Agent | | | llm=main llm, | | | tools= | | | Tool name=TaskToolSet.name , | | | Tool name=TerminalTool.name , | | | Tool name=FileEditorTool.name , | | | , | | | tool concurrency limit=int os.getenv "MAIN TOOL CONCURRENCY", "8" , | | | agent context=AgentContext system message suffix=ORCHESTRATOR SUFFIX , | | | | | | def fusion prompt user task: str - str: | | | return f""" | | | User task: | | | {user task} | | | Run this in a fusion-style workflow. | | | MANDATORY INITIAL DELEGATION RULE: | | | If the user task contains multiple independent areas, directories, files, | | | questions, or investigation threads, your next tool-using assistant response MUST | | | contain one task tool call for every independent item, all in the same response, | | | all with subagent type='sidekick' . This is the core behavior being tested. Do | | | not call only one task. Do not perform direct terminal/file editor investigation | | | first. Do not wait between sidekick launches. | | | After that parallel task batch: | | | 1. Wait for all sidekick observations. | | | 2. Review the sidekick reports yourself. | | | 3. If any sidekick escalates or the work becomes complex, continue with the main | | | model rather than switching models or restarting the conversation. | | | 4. Complete the task and summarize what was done. | | | """.strip | | | def run user task: str, workspace: Path - None: | | | api key = require api key | | | store, main llm = build profiles api key | | | register sidekick store, api key | | | conversation = Conversation | | | agent=build main agent main llm , | | | workspace=workspace, | | | visualizer=DelegationVisualizer name="Fusion main" , | | | persistence dir=Path tempfile.mkdtemp prefix="openhands-fusion-run-" , | | | max iteration per run=int os.getenv "MAIN MAX ITERATIONS", "40" , | | | | | | conversation.send message fusion prompt user task | | | conversation.run | | | metrics = conversation.conversation stats.get combined metrics | | | print f"\nTotal estimated cost: ${metrics.accumulated cost:.6f}" | | | def main - None: | | | user task = " ".join sys.argv 1: .strip | | | if not user task: | | | user task = "Analyze this repository and suggest the smallest useful improvement." | | | run user task=user task, workspace=Path.cwd | | | if name == " main ": | | | main |