{"slug": "self-harness-harnesses-that-improve-themselves", "title": "Self-Harness: Harnesses That Improve Themselves", "summary": "Researchers introduced Self-Harness, a new paradigm enabling LLM-based agents to iteratively improve their own operating harnesses without human intervention. In tests on Terminal-Bench-2.0, Self-Harness boosted pass rates across three diverse base models, from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1%, by converting model-specific weaknesses into concrete harness changes.", "body_md": "# Computer Science > Computation and Language\n\n[Submitted on 8 Jun 2026]\n\n# Title:Self-Harness: Harnesses That Improve Themselves\n\n[View PDF](/pdf/2606.09498)\n\n[HTML (experimental)](https://arxiv.org/html/2606.09498v1)\n\nAbstract:The performance of LLM-based agents is jointly shaped by their base models and the harnesses that mediate their interaction with the environment. Because different models exhibit distinct behaviors, effective harness design is inherently model-specific. Yet agent harnesses are still largely engineered by human experts, a paradigm that scales poorly as modern LLMs become increasingly diverse and rapidly evolving. In this paper, we introduce Self-Harness, a new paradigm in which an LLM-based agent improves its own operating harness, without relying on human engineers or stronger external agents. We operationalize Self-Harness as an iterative loop with three stages: Weakness Mining, which identifies model-specific failure patterns from execution traces; Harness Proposal, which generates diverse yet minimal harness modifications tied to these failures; and Proposal Validation, which accepts candidate edits only after regression testing. We instantiate Self-Harness on Terminal-Bench-2.0 using a minimal initial harness and three base models from diverse families: MiniMax M2.5, Qwen3.5-35B-A3B, and GLM-5. Across all three models, Self-Harness consistently improves performance, with held-out pass rates increasing from 40.5% to 61.9%, 23.8% to 38.1%, and 42.9% to 57.1%, respectively. Qualitative analyses further show that Self-Harness does not simply add generic instructions, but effectively turns model-specific weaknesses into concrete, executable harness changes. These results suggest a path toward LLM-based agents that are not merely shaped by their harnesses, but can also participate in reshaping them.\n\n### References & Citations\n\nLoading...\n\n# Bibliographic and Citation Tools\n\nBibliographic Explorer\n\n*(*[What is the Explorer?](https://info.arxiv.org/labs/showcase.html#arxiv-bibliographic-explorer))\nConnected Papers\n\n*(*[What is Connected Papers?](https://www.connectedpapers.com/about))\nLitmaps\n\n*(*[What is Litmaps?](https://www.litmaps.co/))\nscite Smart Citations\n\n*(*[What are Smart Citations?](https://www.scite.ai/))# Code, Data and Media Associated with this Article\n\nalphaXiv\n\n*(*[What is alphaXiv?](https://alphaxiv.org/))\nCatalyzeX Code Finder for Papers\n\n*(*[What is CatalyzeX?](https://www.catalyzex.com))\nDagsHub\n\n*(*[What is DagsHub?](https://dagshub.com/))\nGotit.pub\n\n*(*[What is GotitPub?](http://gotit.pub/faq))\nHugging Face\n\n*(*[What is Huggingface?](https://huggingface.co/huggingface))\nScienceCast\n\n*(*[What is ScienceCast?](https://sciencecast.org/welcome))# Demos\n\n# Recommenders and Search Tools\n\nInfluence Flower\n\n*(*[What are Influence Flowers?](https://influencemap.cmlab.dev/))\nCORE Recommender\n\n*(*[What is CORE?](https://core.ac.uk/services/recommender))# arXivLabs: experimental projects with community collaborators\n\narXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.\n\nBoth individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.\n\nHave an idea for a project that will add value for arXiv's community? [ Learn more about arXivLabs](https://info.arxiv.org/labs/index.html).", "url": "https://wpnews.pro/news/self-harness-harnesses-that-improve-themselves", "canonical_source": "https://arxiv.org/abs/2606.09498", "published_at": "2026-06-22 19:45:05+00:00", "updated_at": "2026-06-24 20:12:58.725972+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "machine-learning", "ai-research", "natural-language-processing"], "entities": ["Self-Harness", "Terminal-Bench-2.0", "MiniMax M2.5", "Qwen3.5-35B-A3B", "GLM-5"], "alternates": {"html": "https://wpnews.pro/news/self-harness-harnesses-that-improve-themselves", "markdown": "https://wpnews.pro/news/self-harness-harnesses-that-improve-themselves.md", "text": "https://wpnews.pro/news/self-harness-harnesses-that-improve-themselves.txt", "jsonld": "https://wpnews.pro/news/self-harness-harnesses-that-improve-themselves.jsonld"}}