{"slug": "hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness", "title": "Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights", "summary": "Hexo Labs released SIA (Self-Improving AI) as an open-source framework under an MIT license this week, enabling an AI agent to edit both its scaffold and model weights within a single self-improving loop. The system achieved a 20.1 percentage-point accuracy gain on LawBench and a 91.9% runtime reduction on a CUDA kernel task by combining harness updates with weight updates via LoRA and PPO. The release marks the first system to simultaneously modify both components, outperforming previous state-of-the-art results across three benchmark domains.", "body_md": "Most AI agents stop improving once a human stops tuning them. The model is fixed. The scaffold around it is fixed. Hexo Labs wants to move both at once. It released [SIA (Self-Improving AI)](https://github.com/hexo-ai/sia) this week as an open-source framework under an MIT license.\n\nThe core claim of this research is narrow but concrete. SIA edits both the agent’s scaffold and the model’s weights inside one self-improving loop.\n\n**What is SIA (Self-Improving AI)**\n\nSIA splits a task-specific agent into two parts. The first is the harness, also called the scaffold. That covers the system prompt, tool-dispatch logic, retry policy, and answer-extraction code. The second part is the model weights themselves.\n\nThree LLM components drive the loop. A Meta-Agent writes the initial scaffold from a task specification and any reference code. A Task-Specific Agent runs the task and logs every step. A Feedback-Agent then reads that full trajectory and decides what to change.\n\nThat decision is the key idea. After each run, the Feedback-Agent picks one of two actions. It can rewrite the scaffold while weights stay fixed. Or it can trigger a weight update while the scaffold stays fixed.\n\nThe base model is openai/gpt-oss-120b. Weight updates use LoRA, a low-rank adapter, at rank 32. The Meta-Agent and Feedback-Agent both run on Claude Sonnet 4.6. Training runs on H100 GPUs through Modal, the team’s RL platform.\n\nThe research team labels its two operating points SIA-H and SIA-W+H. SIA-H uses harness updates only. SIA-W+H adds weight updates on top.\n\n**The Benchmark Case**\n\nThe research team tested SIA on three deliberately different domains. The pattern held across all three. Weight updates added gains beyond what scaffold editing alone reached. “Initial” is the base model through the Meta-Agent’s first scaffold, before any feedback.\n\n| Task | Initial | Prev. SOTA | SIA-H (harness only) | SIA-W+H (harness + weights) |\n|---|---|---|---|---|\n| LawBench (top-1 acc) | 13.5% | 45.0% | 50.0% | 70.1% |\n| AlphaEvolve TriMul (reward) | 0.105 | 1.292 | 0.120 | 1.475 |\n| Denoising (mse_norm) | 0.048 | 0.240 | 0.241 | 0.289 |\n\nOn LawBench, the task is 191-class Chinese criminal charge classification. Harness iteration built a TF-IDF plus LinearSVC pipeline and plateaued at 50.0%. Weight updates via PPO then pushed accuracy to 70.1%. That is a 20.1 percentage-point gain over the harness-only best.\n\nThe TriMul task asks for a custom CUDA kernel on an H100 GPU. The kernel computes a core operation in AlphaFold2’s Evoformer module. Scaffold edits reached a 1.14× speedup over baseline. Weight updates then drove runtime from 12,483 to 1,017 microseconds. That is a 91.9% reduction from the harness-only peak.\n\nOne honest caveat appears in the same chart. The coding agent Claude Code reached 1.50× on TriMul unaided, beating SIA-H’s 1.14×. SIA-W+H still led overall at 14.02×.\n\nFor denoising, the agent tunes MAGIC, a single-cell RNA imputation method. Harness sweeps over its hyperparameters settled at 0.241 mse_norm. The first weight-update checkpoint added a two-line step that no scaffold produced. It rounded imputed counts to non-negative integers, lifting the score to 0.289.\n\n**How the Feedback-Agent Picks Its Move**\n\nSIA does not run one fixed RL recipe. The Feedback-Agent selects a training algorithm based on the reward signal it observes.\n\nOn LawBench, the reward was a clean outcome-based scalar, so it used PPO with GAE. On TriMul, most kernels failed to compile, so it used entropic advantage weighting. That method up-weights rare high-reward rollouts. On denoising, it used GRPO, which eliminates the value network entirely.\n\nThe research team also lists REINFORCE with KL-to-base, DPO, and best-of-N behavioural cloning. Each maps to a different reward shape and failure risk.\n\n**Strengths ****and What to Watch**\n\n**and What to Watch**\n\n**Strengths:**\n\n- First system to edit both scaffold and weights in one loop, per the authors’ comparison table.\n- Consistent gains over prior SOTA across three unrelated domains.\n- Open source under MIT, installable as sia-agent, with four bundled tasks.\n- Algorithm choice is conditioned on observed rewards, not a fixed schedule.\n\n**What to Watch:**\n\n- The research reports three tasks; broader algorithm-selection results are deferred.\n- Both levers optimise the same fixed verifier, risking coupled Goodhart effects.\n- The research warn the joint fixed point may be fragile under perturbation.\n\n**Marktechpost’s Visual Explainer**\n\n*SIA: Self Improving AI with Harness & Weight Updates*(arXiv:2605.27276)\n\n[github.com/hexo-ai/sia](https://github.com/hexo-ai/sia)\n\n**Key Takeaways**\n\n- SIA is the first self-improving loop that edits both an agent's scaffold and its model weights.\n- A Feedback-Agent reads each run's full trajectory, then picks a harness rewrite or weight update.\n- Combining both levers beat scaffold-only on all three tasks: LawBench, TriMul kernels, scRNA-seq denoising.\n- Harness edits add software-engineering hygiene; weight updates surface domain knowledge no prompt reaches.\n- Open source under MIT (hexo-ai/sia), built on gpt-oss-120b with LoRA rank 32.\n\nCheck out the ** Repo **and\n\n**Also, feel free to follow us on**\n\n[Research Paper](https://arxiv.org/pdf/2605.27276).**and don’t forget to join our**[Twitter](https://x.com/intent/follow?screen_name=marktechpost)\n\n**and Subscribe to**\n\n[150k+ ML SubReddit](https://www.reddit.com/r/machinelearningnews/)**. Wait! are you on telegram?**\n\n[our Newsletter](https://www.aidevsignals.com/)\n\n[now you can join us on telegram as well.](https://t.me/machinelearningresearchnews)Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? [Connect with us](https://forms.gle/wbash1wF6efRj8G58)", "url": "https://wpnews.pro/news/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness", "canonical_source": "https://www.marktechpost.com/2026/05/29/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness-and-the-model-weights/", "published_at": "2026-05-29 07:28:37+00:00", "updated_at": "2026-05-29 08:12:00.029035+00:00", "lang": "en", "topics": ["ai-agents", "ai-research", "machine-learning", "large-language-models"], "entities": ["Hexo Labs", "SIA", "Self-Improving AI", "Claude Sonnet 4.6", "Modal", "LoRA", "openai/gpt-oss-120b", "H100 GPUs"], "alternates": {"html": "https://wpnews.pro/news/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness", "markdown": "https://wpnews.pro/news/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness.md", "text": "https://wpnews.pro/news/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness.txt", "jsonld": "https://wpnews.pro/news/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness.jsonld"}}