What 1k Harness Experiments Taught Me About Self-Improving Agents
A researcher conducted over 1,000 experiments to test whether an AI agent could autonomously improve its own "harness" — the system wrapping a language model to interact with tasks on Terminal Bench. The agent repeatedly…