{"slug": "don-t-make-the-agent-re-run-the-test-suite-to-find-the-failure", "title": "Don't Make the Agent Re-Run the Test Suite to Find the Failure", "summary": "The article describes a common failure mode where AI coding agents re-run entire test suites to identify which tests failed, despite the failure information being present in the initial output. The proposed solution is to use the Unix `tee` command to pipe test output into a gitignored `.test-output/` directory, allowing the agent to `grep` the persisted log file for failures instead of re-running the expensive test suite. This approach saves time by eliminating redundant test executions while preserving real-time terminal feedback.", "body_md": "Here is a small failure mode that cost me time for longer than it should have.\nThe agent would run the test suite. Tests would fail. The agent would announce that tests failed. Then, when it needed to know which tests failed, would either guess, ask, or run the suite a second time to scrape the output. Sometimes it would run it a third time. The failing tests were right there in the output of the first run; the agent didn't read them carefully enough to remember.\nThis is not the model being lazy. It is the model being honest about its attention. Test output is voluminous, the failure summary is buried at the bottom, and by the time the agent is reasoning about the next step (\"which test failed, what assertion, what file\"), the relevant lines have scrolled past whatever the agent actually retained. The cheapest way to get the information back is to run the command again.\nRunning the command again is the wrong answer. A full suite takes minutes. Doing it twice to learn something the first run already told you is a tax on every red-test loop, and the loops compound.\ntee\neverything to a gitignored directory\nThe rule I added is short:\nEvery test command pipes through\ntee\ninto.test-output/<scope>.log\n. The directory is gitignored. When you need to inspect failures,grep\nthe log; do not re-run the suite.\nThe agent's commands now look like:\nmake test 2>&1 | tee .test-output/full.log\nmake test-parallel 2>&1 | tee .test-output/parallel.log\nmake vitest 2>&1 | tee .test-output/vitest.log\nThat is it. The fix is one Unix utility, deployed deliberately.\ntee\nis the right tool\nThe trick is that tee\ndoes not replace the stream; it forks it. STDOUT still flows to the terminal in real time. The agent still sees the test run as it happens, still gets the streaming feedback that lets it react to a hang or an obvious early failure. Nothing about the foreground experience changes.\nWhat changes is that the run is also persisted. After the command exits, the output exists as a file. The agent can grep -n FAIL .test-output/full.log\n, jump to the failing test, read three lines of context, and decide what to do without burning another full suite run to recover information.\nYou could imagine alternatives. Redirect everything to a file and tail it (loses interactive feedback). Tell the agent to \"remember the test output\" (the failure mode this is supposed to fix). Increase the agent's context window (treats the symptom, not the cause). The tee\napproach is boring, which is its strength. It uses a tool that has been in every Unix shell since 1973, costs nothing, and survives every framework choice the project might make later.\n.test-output/\nis committed nowhere. It is ephemeral: overwritten on every run, never inspected by humans, never reviewed in a PR. Making it a real directory rather than /tmp\nkeeps it scoped to the project, which means a grep\nfrom the repo root finds it without thinking.\nI also keep the filenames stable. full.log\n, parallel.log\n, vitest.log\n; not timestamped. The agent never has to ask \"which file is the latest\"? The latest is the only one. If you want history, the agent can copy it before re-running. By default, nobody needs history; they need the most recent failure.\nThe interesting thing here is not tee\n. It is the shape of the fix.\nAn agent reading a 4,000-line test run and an agent grepping a file for FAIL\nare doing the same work in principle, but the second one is the work the tool is good at. Long, streaming output is something an agent should read past (most of it is success noise) and only return to when a question demands it. Persisting the output to disk turns the question \"what failed?\" from \"scroll back through my context and hope\" into \"run a deterministic command and read the answer.\"\nWherever an agent re-runs a command to recover information that was already produced, there is a tee\nwaiting to be added. Build output. Type-check output. Linter output. Long-running scripts that print progress. Any time the command is expensive and the output is the artifact, persist the artifact. The rule shape (fork the stream, persist the artifact, grep the file) generalizes far past test runs.\nThe agent does not need a better memory. It needs the harness to stop throwing away information that was already on the screen.", "url": "https://wpnews.pro/news/don-t-make-the-agent-re-run-the-test-suite-to-find-the-failure", "canonical_source": "https://dev.to/tacoda/dont-make-the-agent-re-run-the-test-suite-to-find-the-failure-427", "published_at": "2026-05-22 18:22:38+00:00", "updated_at": "2026-05-22 18:31:54.280870+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "large-language-models"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/don-t-make-the-agent-re-run-the-test-suite-to-find-the-failure", "markdown": "https://wpnews.pro/news/don-t-make-the-agent-re-run-the-test-suite-to-find-the-failure.md", "text": "https://wpnews.pro/news/don-t-make-the-agent-re-run-the-test-suite-to-find-the-failure.txt", "jsonld": "https://wpnews.pro/news/don-t-make-the-agent-re-run-the-test-suite-to-find-the-failure.jsonld"}}