This field test was against LangChain.
The issue was LangChain #34818:
https://github.com/langchain-ai/langchain/issues/34818 The reported problem was that agent streaming behaved differently when structured output was enabled.
Without structured output, the agent could stream natural language before calling a tool.
With structured output enabled through ToolStrategy
, that intermediate text disappeared.
That matters because it breaks a common agent experience:
“I’m going to check that now...”
tool call
“Here is what I found...”
final answer The visible symptom was:
structured output blocks intermediate streaming
But the useful diagnostic boundary was narrower:
intermediate agent stream vs final structured-output enforcement
Those are different moments in the agent lifecycle.
An intermediate agent turn may need to stream natural language before a tool call.
The final answer may need to be constrained into a structured schema.
Those two requirements should not necessarily take ownership of the same turn.
The local repair candidate focused only on the ToolStrategy
path.
It does not redesign LangChain streaming.
It does not change all structured-output behavior.
It does not touch ProviderStrategy
, which is more design-sensitive.
The repair keeps the final structured-output protection intact, but avoids forcing the structured-output tool choice too early on the first model turn when real tools are available.
In plain terms:
Let the agent behave like an agent before the real tool call. Then enforce structured output when it is time to produce the final structured answer.
A focused regression proved the failure before repair and passed after repair.
Validation also passed across the relevant response-format tests, agent-streaming tests, formatting, type checking, and diff checks.
ToolStrategy
repair candidate and regression testThe important part of this field test is the shape of the repair.
The broad symptom looked like:
structured output breaks streaming
The smaller repair boundary was:
structured output enforcement takes ownership one turn too early
That is a very different diagnosis.
It means the repair does not need to weaken structured output.
It does not need to rewrite agent streaming.
It does not need to change every strategy.
It only needs to respect the boundary between the first real-tool turn and the final structured-output turn.
That is the larger pattern emerging from these field tests.
Scarab Diagnostic Suite began as a way to build with AI without letting the codebase drift away from its own truth.
AI agents can move fast, but fast code is not the same as truthful code. A repo has boundaries. It has ownership rules. It has contracts. It has places where truth is supposed to live.
But these field tests are showing something broader:
software systems drift too.
Not just AI.
Developers drift.
Teams drift.
Mature platforms drift.
A lot of hard software bugs are places where one part of the system is still operating under one truth while another part has started obeying a different one.
In this field test, the drift was timing and ownership:
the structured-output path began enforcing final-answer behavior before the agent had completed the intermediate tool-use step.
That is the kind of boundary Scarab is built to surface.
The repair is small because the boundary is specific.
That does not make it weak.
It makes it safer.
A lot of sprawling software problems begin at one precise place where timing, ownership, or truth crosses the wrong boundary.
Find that place, and the repair can be much smaller than the symptom suggests.