TestSmith has two distinct audiences that need context about the project: AI agents that work on the TestSmith codebase (helping develop and extend it), and the LLM that generates test code for your project at runtime. These are different problems with different solutions.
When an AI agent opens TestSmith to fix a bug or add a feature, it needs to understand the codebase structure without reading every file. A single large context file doesn't work well — an agent fixing a retry bug doesn't need to know the Java driver's fixture generation logic.
The solution is a CLAUDE.md
hierarchy:
CLAUDE.md ← package map, invariants, dependency direction
internal/domain/CLAUDE.md ← interfaces, key types, "add a field" checklist
internal/generation/CLAUDE.md ← pipeline data flow, verifier selection
internal/llm/CLAUDE.md ← middleware stack, batch vs fan-out, cache key
internal/projectknowledge/CLAUDE.md ← TESTSMITH.md hierarchy, budget tiers
internal/drivers/CLAUDE.md ← how to add an adapter or language driver
The root file is the map. The per-package files are the territory. An agent touching the LLM retry logic loads internal/llm/CLAUDE.md
— it never sees the driver or generation docs.
The root file contains three things that every agent needs regardless of task:
domain
never imports other internal packages; drivers
never import generation
)GeneratedFile.Language
must always be set; resolveAction
has specific rules for fixture vs. non-fixture files)Per-package files contain the "read this before touching this package" context: data flow diagrams for the pipeline, the middleware stack for the LLM layer, the adapter registration pattern for drivers.
When Claude Code loads a file in a package, it automatically reads that package's CLAUDE.md
. The agent gets exactly what it needs, nothing more.
This is what TestSmith injects into prompts when generating tests for your project. It's a conventions file you maintain alongside your source code.
Two levels are merged at generation time:
<project-root>/TESTSMITH.md ← always loaded; project-wide framework, mock style
<source-dir>/TESTSMITH.md ← optional; package-level overrides
Example root TESTSMITH.md
:
Framework: pytest
Mock style: pytest-mock (use `mocker.patch`, not `unittest.mock.patch`)
Assertion style: plain assert statements
Services are in `src/services/`. Each service has a single public class.
Tests go in `tests/` mirroring the `src/` structure.
Example per-directory override in src/services/payment/TESTSMITH.md
:
This module integrates with Stripe. Mock all `stripe.*` calls.
Use `pytest.mark.vcr` for HTTP interaction tests.
The root file is loaded once at startup and cached in ProjectContext
. The per-directory file is merged lazily — only when a file in that directory is being generated. A large monorepo never loads context it doesn't need.
Both files go into the system prompt, not the user prompt. This matters because the user prompt is subject to a configurable token budget (PromptTokenBudget
, default 6,000 tokens) with a priority-based trim:
| Priority | Content | Dropped when? |
|---|---|---|
| 1 (never) | Source code | Never |
| 2 | Internal dep signatures | Budget exceeded after source |
| 3 | Style snippet from nearby tests | Dropped first |
Project knowledge is exempt from this budget entirely — it stays in the system prompt regardless of how large the source file is.
Beyond TESTSMITH.md
, TestSmith also mines conventions from existing tests in the same directory — up to 5 files, capped at 80 lines total. This gives the model real examples of the project's test style without requiring the developer to maintain a conventions doc.
This is cheaper and more accurate than a hand-written guide: it automatically reflects the actual test patterns in use, and it updates itself as tests evolve. If your team starts using a new assertion pattern, the next generation run picks it up.
The third piece is the dep index: at the start of a --all
run, TestSmith analyses every source file once and builds a modulePath → SourceAnalysis
map. When generating tests for payment.go
, it can pull the public API signature of discount.go
(which payment.go
imports) from memory:
// In the prompt:
// Internal dependency signatures:
// discount.ApplyPromoCode(order Order, code string) (Order, error)
// discount.ValidateCode(code string) bool
This tells the model what the real interface looks like so it generates test doubles that match the actual signatures — not invented ones.
In watch mode, when a file changes, only that file's entry is refreshed. The rest of the index stays warm between regens.
The two layers solve different problems:
Agent context is about development-time navigation. It's hierarchical, human-readable, and loaded selectively. It describes architecture and invariants. It lives in the repo and is maintained alongside the code it describes.
Runtime LLM context is about generation-time quality. It's merged from two levels, injected into system prompts, and exempt from token budgets. It describes conventions and patterns specific to the target project — things an LLM can't infer from source code alone.
Conflating the two leads to either bloated system prompts (dumping agent context into every generation request) or under-informed agents (giving them only the user-facing conventions doc with no architectural guidance). Keeping them separate means each audience gets exactly what it needs.
Next: the cross-platform bugs we hit shipping a Go CLI — detector boundary escapes and Windows path separators.