{"slug": "fusion-harness-how-to-combine-a-more-expensive-main-model-and-a-sidekick-model", "title": "Fusion Harness: How to combine a more expensive main model and a sidekick model", "summary": "A developer built a fusion-style delegation harness using the OpenHands SDK that combines a high-capability main LLM agent with a cheaper sidekick agent. The harness allows the main agent to issue multiple parallel task-tool calls to the sidekick, which can escalate back to the main agent when work exceeds its budget. This approach aims to reduce costs by delegating read-only investigation and small tasks to the sidekick while reserving complex decisions for the main model.", "body_md": "| \"\"\"Fusion-style delegation harness built with the OpenHands SDK. | |\n| Install: | |\n| uv pip install openhands-sdk openhands-tools | |\n| Run: | |\n| export LLM_API_KEY=\"...\" # or export OPENHANDS_API_KEY=\"...\" | |\n| export MAIN_MODEL=\"openhands/gpt-5.5\" | |\n| export SIDEKICK_MODEL=\"openhands/minimax-m2.7\" | |\n| uv run python fusion_harness_example.py \"Find and fix the failing tests in this repo.\" | |\n| What this demonstrates: | |\n| - the main agent keeps one high-capability LLM profile for the whole task | |\n| - the cheap sidekick is registered as a sub-agent with its own LLM profile | |\n| - the main agent can issue several task-tool calls in one response | |\n| - tool_concurrency_limit lets those sidekick calls run concurrently | |\n| - sidekick returns ESCALATE_TO_MAIN when work exceeds its budget | |\n| \"\"\" | |\n| from __future__ import annotations | |\n| import os | |\n| import sys | |\n| import tempfile | |\n| from pathlib import Path | |\n| from pydantic import SecretStr | |\n| from openhands.sdk import Agent, AgentContext, Conversation, LLM, LLMProfileStore, Tool | |\n| from openhands.sdk.context import Skill | |\n| from openhands.sdk.subagent import register_agent | |\n| from openhands.sdk.subagent.schema import AgentDefinition | |\n| from openhands.tools.delegate import DelegationVisualizer | |\n| from openhands.tools.file_editor import FileEditorTool | |\n| from openhands.tools.task import TaskToolSet | |\n| from openhands.tools.terminal import TerminalTool | |\n| DEFAULT_MAIN_MODEL = \"openhands/gpt-5.5\" | |\n| DEFAULT_SIDEKICK_MODEL = \"openhands/minimax-m2.7\" | |\n| SIDEKICK_SKILL = \"\"\" | |\n| You are a fast, low-cost sidekick agent. Your job is to help the main agent, | |\n| not to complete the whole user request on your own. | |\n| Rules: | |\n| 1. Prefer read-only investigation: inspect files, locate relevant code, propose | |\n| small plans, and draft patch sketches. Do not edit files unless the prompt | |\n| explicitly asks you to. | |\n| 2. Keep output compact and structured. Use at most three tool calls unless the | |\n| prompt explicitly allows more. Prefer broad signals over exhaustive reading. | |\n| 3. If the task requires broad architecture decisions, many-file edits, unknown | |\n| product judgement, security-sensitive changes, or you are not confident, | |\n| stop and return exactly: | |\n| ESCALATE_TO_MAIN: <short reason> | |\n| 4. Otherwise return: | |\n| FINDINGS: | |\n| - ... | |\n| PROPOSED_NEXT_STEP: | |\n| - ... | |\n| RISK: | |\n| - low|medium|high, with one sentence why | |\n| \"\"\".strip() | |\n| ORCHESTRATOR_SUFFIX = \"\"\" | |\n| You are the main high-capability agent in a fusion-style harness. | |\n| Critical parallel-delegation protocol: | |\n| - If the user lists multiple independent areas, your initial delegation step MUST | |\n| be a single assistant response containing one `task` tool call per area, all | |\n| with `subagent_type='sidekick'`. | |\n| - Do not delegate only the first area and wait. Do not say you will launch | |\n| several sidekicks and then call only one. Actually emit all sidekick task calls | |\n| in the same tool-call batch so `tool_concurrency_limit` can run them in | |\n| parallel. | |\n| - Before the initial delegation batch, avoid direct terminal/file-editor work | |\n| unless the user did not provide enough information to form sidekick prompts. | |\n| - After all sidekick observations return, review them yourself, deduplicate, and | |\n| make final prioritization with the main model. | |\n| Good sidekick tasks: | |\n| - locate relevant files | |\n| - inspect test failures or logs | |\n| - summarize a narrow subsystem | |\n| - draft a small patch plan | |\n| - check docs or dependency files | |\n| Do not delegate broad design, final decisions, risky edits, or cross-cutting | |\n| implementation. If any sidekick returns `ESCALATE_TO_MAIN`, stop delegating that | |\n| thread and handle it yourself with the main model. | |\n| Always review sidekick output before acting. Treat sidekick output as advisory, | |\n| not authoritative. The final answer and any code changes are your responsibility. | |\n| \"\"\".strip() | |\n| def require_api_key() -> str: | |\n| api_key = os.getenv(\"LLM_API_KEY\") or os.getenv(\"OPENHANDS_API_KEY\") | |\n| if not api_key: | |\n| raise RuntimeError( | |\n| \"Set LLM_API_KEY, or OPENHANDS_API_KEY when using OpenHands-hosted models.\" | |\n| ) | |\n| return api_key | |\n| def save_profile( | |\n| store: LLMProfileStore, | |\n| name: str, | |\n| usage_id: str, | |\n| model: str, | |\n| base_url: str | None, | |\n| ) -> None: | |\n| store.save( | |\n| name, | |\n| LLM( | |\n| usage_id=usage_id, | |\n| model=model, | |\n| base_url=base_url, | |\n| ), | |\n| include_secrets=False, | |\n| ) | |\n| def load_profile(store: LLMProfileStore, name: str, api_key: str) -> LLM: | |\n| return store.load(name).model_copy(update={\"api_key\": SecretStr(api_key)}) | |\n| def build_profiles(api_key: str) -> tuple[LLMProfileStore, LLM]: | |\n| base_url = os.getenv(\"LLM_BASE_URL\") | |\n| main_model = os.getenv(\"MAIN_MODEL\", DEFAULT_MAIN_MODEL) | |\n| sidekick_model = os.getenv(\"SIDEKICK_MODEL\", DEFAULT_SIDEKICK_MODEL) | |\n| profile_dir = Path(tempfile.mkdtemp(prefix=\"openhands-fusion-profiles-\")) | |\n| store = LLMProfileStore(base_dir=str(profile_dir)) | |\n| save_profile( | |\n| store, | |\n| name=\"fusion-main\", | |\n| usage_id=\"main\", | |\n| model=main_model, | |\n| base_url=base_url, | |\n| ) | |\n| save_profile( | |\n| store, | |\n| name=\"fusion-sidekick\", | |\n| usage_id=\"sidekick\", | |\n| model=sidekick_model, | |\n| base_url=base_url, | |\n| ) | |\n| return store, load_profile(store, \"fusion-main\", api_key) | |\n| def register_sidekick(store: LLMProfileStore, api_key: str) -> None: | |\n| def create_sidekick(_: LLM) -> Agent: | |\n| return Agent( | |\n| llm=load_profile(store, \"fusion-sidekick\", api_key), | |\n| tools=[ | |\n| Tool(name=TerminalTool.name), | |\n| Tool(name=FileEditorTool.name), | |\n| ], | |\n| tool_concurrency_limit=3, | |\n| agent_context=AgentContext( | |\n| skills=[ | |\n| Skill( | |\n| name=\"fusion_sidekick_protocol\", | |\n| content=SIDEKICK_SKILL, | |\n| trigger=None, | |\n| ) | |\n| ], | |\n| system_message_suffix=\"Stay within the sidekick protocol.\", | |\n| ), | |\n| ) | |\n| register_agent( | |\n| name=\"sidekick\", | |\n| factory_func=create_sidekick, | |\n| description=AgentDefinition( | |\n| name=\"sidekick\", | |\n| description=( | |\n| \"Fast low-cost sub-agent for bounded investigation, patch \" | |\n| \"sketches, and escalation signals.\" | |\n| ), | |\n| max_iteration_per_run=int(os.getenv(\"SIDEKICK_MAX_ITERATIONS\", \"6\")), | |\n| max_budget_per_run=float(os.getenv(\"SIDEKICK_MAX_BUDGET\", \"0.10\")), | |\n| ), | |\n| ) | |\n| def build_main_agent(main_llm: LLM) -> Agent: | |\n| return Agent( | |\n| llm=main_llm, | |\n| tools=[ | |\n| Tool(name=TaskToolSet.name), | |\n| Tool(name=TerminalTool.name), | |\n| Tool(name=FileEditorTool.name), | |\n| ], | |\n| tool_concurrency_limit=int(os.getenv(\"MAIN_TOOL_CONCURRENCY\", \"8\")), | |\n| agent_context=AgentContext(system_message_suffix=ORCHESTRATOR_SUFFIX), | |\n| ) | |\n| def fusion_prompt(user_task: str) -> str: | |\n| return f\"\"\" | |\n| User task: | |\n| {user_task} | |\n| Run this in a fusion-style workflow. | |\n| MANDATORY INITIAL DELEGATION RULE: | |\n| If the user task contains multiple independent areas, directories, files, | |\n| questions, or investigation threads, your next tool-using assistant response MUST | |\n| contain one `task` tool call for every independent item, all in the same response, | |\n| all with `subagent_type='sidekick'`. This is the core behavior being tested. Do | |\n| not call only one task. Do not perform direct terminal/file_editor investigation | |\n| first. Do not wait between sidekick launches. | |\n| After that parallel task batch: | |\n| 1. Wait for all sidekick observations. | |\n| 2. Review the sidekick reports yourself. | |\n| 3. If any sidekick escalates or the work becomes complex, continue with the main | |\n| model rather than switching models or restarting the conversation. | |\n| 4. Complete the task and summarize what was done. | |\n| \"\"\".strip() | |\n| def run(user_task: str, workspace: Path) -> None: | |\n| api_key = require_api_key() | |\n| store, main_llm = build_profiles(api_key) | |\n| register_sidekick(store, api_key) | |\n| conversation = Conversation( | |\n| agent=build_main_agent(main_llm), | |\n| workspace=workspace, | |\n| visualizer=DelegationVisualizer(name=\"Fusion main\"), | |\n| persistence_dir=Path(tempfile.mkdtemp(prefix=\"openhands-fusion-run-\")), | |\n| max_iteration_per_run=int(os.getenv(\"MAIN_MAX_ITERATIONS\", \"40\")), | |\n| ) | |\n| conversation.send_message(fusion_prompt(user_task)) | |\n| conversation.run() | |\n| metrics = conversation.conversation_stats.get_combined_metrics() | |\n| print(f\"\\nTotal estimated cost: ${metrics.accumulated_cost:.6f}\") | |\n| def main() -> None: | |\n| user_task = \" \".join(sys.argv[1:]).strip() | |\n| if not user_task: | |\n| user_task = \"Analyze this repository and suggest the smallest useful improvement.\" | |\n| run(user_task=user_task, workspace=Path.cwd()) | |\n| if __name__ == \"__main__\": | |\n| main() |", "url": "https://wpnews.pro/news/fusion-harness-how-to-combine-a-more-expensive-main-model-and-a-sidekick-model", "canonical_source": "https://gist.github.com/neubig/412ab8df8e6fd0b2bdf10602d77f9d86", "published_at": "2026-06-29 23:28:43+00:00", "updated_at": "2026-06-29 23:48:21.802891+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "developer-tools"], "entities": ["OpenHands SDK", "GPT-5.5", "Minimax M2.7", "LLMProfileStore", "TaskToolSet", "DelegationVisualizer", "FileEditorTool", "TerminalTool"], "alternates": {"html": "https://wpnews.pro/news/fusion-harness-how-to-combine-a-more-expensive-main-model-and-a-sidekick-model", "markdown": "https://wpnews.pro/news/fusion-harness-how-to-combine-a-more-expensive-main-model-and-a-sidekick-model.md", "text": "https://wpnews.pro/news/fusion-harness-how-to-combine-a-more-expensive-main-model-and-a-sidekick-model.txt", "jsonld": "https://wpnews.pro/news/fusion-harness-how-to-combine-a-more-expensive-main-model-and-a-sidekick-model.jsonld"}}