{"slug": "a-multi-tool-agent-harness-graph-routing-middleware-and-state-budgets", "title": "A multi-tool agent harness: graph routing, middleware, and state budgets", "summary": "LangChain's `deepagents` framework implements a multi-tool agent harness using a graph-based runtime with middleware, state channels, and a tool router that manages model-tool interaction loops. The system compiles a `StateGraph` from LangGraph, wrapping model calls in middleware layers for todo lists, filesystem access, subagents, summarization, and prompt caching while persisting conversation state to an SQLite checkpointer. This architecture transforms tools from simple function calls into components of a persistent, resumable graph runtime that can handle multiple internal model-tool cycles per user message.", "body_md": "I recently dug into LangChain's `deepagents`\n\nframework while building the prompt loop CLI. The interesting part was not just \"the agent can call tools.\" The interesting part was the harness around the model: the graph, middleware, state channels, tool router, and checkpoint system that make a long-running agent feel like a single conversation.\n\nHere is a deep dive into that harness. The goal is to answer two practical questions:\n\n- What are the moving pieces inside a Deep Agent? What's the harness behind multi-tool agents?\n- While building this, one question came to me: when running multiple tool callings, llm + tools together with multiple turns feel slow to me. 'How to orchestrate tool calls with llm better to maximium token and time usage?' I guess this is the question of 'what is a good harness to make the loop smooth'.\n\nAt the highest level, `deepagents`\n\nis a policy and middleware layer on top of LangChain's `create_agent()`\n\n, which compiles down to a LangGraph `StateGraph`\n\n.\n\nThe model is not running alone. It sits inside a loop:\n\n``` php\nmodel -> router -> tools -> router -> model -> ...\n```\n\nThe framework manages:\n\n- which tools are available;\n- how tool calls are executed;\n- what state survives between turns;\n- how old context is summarized;\n- how large tool results are offloaded;\n- how the graph resumes from a previous thread.\n\nIn my local setup, the agent is created roughly like this:\n\n```\nmodel = init_chat_model(\"anthropic:claude-sonnet-4-6\", temperature=0)\n\nbackend = FilesystemBackend(\n    root_dir=str(project_dir),\n    virtual_mode=False,\n)\n\nagent = create_deep_agent(\n    model=model,\n    tools=custom_eval_tools,\n    system_prompt=SYSTEM_PROMPT,\n    checkpointer=AsyncSqliteSaver(...),\n    backend=backend,\n)\n```\n\n`create_deep_agent()`\n\nis the main entry point — the call that turns a model, tools, backend, and prompt into a running agent. It does not build the execution graph from scratch; it assembles those pieces into a middleware stack, then hands them off to a factory that compiles everything into a runnable loop.\n\nThe default stack includes:\n\n`TodoListMiddleware`\n\n: adds a`write_todos`\n\nplanning tool; stores the todo list in state.`FilesystemMiddleware`\n\n: adds file tools (`ls`\n\n,`read_file`\n\n,`write_file`\n\n,`edit_file`\n\n,`glob`\n\n,`grep`\n\n) and optionally`execute`\n\n.`SubAgentMiddleware`\n\n: adds a`task`\n\ntool for launching isolated subagents.- Summarization: compacts long conversations and offloads old history to the backend.\n- Prompt caching: applies provider-specific caching when supported.\n`PatchToolCallsMiddleware`\n\n: repairs incomplete tool calls if execution is interrupted.`MemoryMiddleware`\n\n*(optional)*: injects`AGENTS.md`\n\ncontent into the system prompt.`SkillsMiddleware`\n\n*(optional)*: injects skill definitions into the system prompt.`AsyncSubAgentMiddleware`\n\n*(optional)*: adds tools for managing background subagents.`HumanInTheLoopMiddleware`\n\n*(optional)*: pauses the graph for human approval.\n\nMiddleware is not a separate graph node. Each layer wraps the model call invisibly — the execution graph just sees a model node and a tools node with routing between them. The core idea: tools are not just functions bolted onto an LLM. They are part of a graph runtime with state, routing, persistence, and middleware hooks.\n\nA single user message can trigger several internal model/tool cycles.\n\n```\nsequenceDiagram\n    participant User\n    participant CLI\n    participant Graph as LangGraph\n    participant LLM\n    participant ToolNode\n    participant DB as SQLite Checkpointer\n\n    User->>CLI: Type message\n    CLI->>Graph: astream_events({messages: [HumanMessage]})\n    Graph->>DB: Load latest checkpoint for thread_id\n    Graph->>LLM: Model call with state.messages + system prompt + tool schemas\n    LLM-->>Graph: AIMessage with tool_calls: read_file, read_file, ls\n    Graph->>DB: Persist model step state\n    Graph->>ToolNode: Execute all pending tool calls\n    ToolNode-->>Graph: ToolMessage results\n    Graph->>DB: Persist tool step state\n    Graph->>LLM: Model call with previous messages + tool results\n    LLM-->>Graph: More tool calls or final answer\n    Graph->>DB: Persist next step state\n    Graph-->>CLI: Stream events and text chunks\n    CLI-->>User: Render tool starts/ends and assistant text\n```\n\nThe model can emit multiple tool calls in a single response. One turn might request:\n\n```\nread_file(...)\nread_file(...)\nls(...)\n```\n\nThose execute as one batch. After the results are appended to state, control returns to the model. So the loop is:\n\n``` php\nLLM turn -> zero or more tools -> LLM turn -> zero or more tools -> ...\n```\n\nThat distinction matters for both latency and cost.\n\nTools enter the system from two places.\n\nFirst, your application can pass custom tools. These are the tools I created for my prompt eval agent which can register, evaluate, run test and improve the prompt.\n\n```\ntools = [\n    *make_prompt_tools(project_dir),\n    *make_test_case_tools(project_dir),\n    *make_runner_tools(project_dir),\n    *make_report_tools(project_dir),\n]\n```\n\nSecond, middleware contributes tools — filesystem middleware adds file tools, todo middleware adds `write_todos`\n\n, subagent middleware adds `task`\n\n.\n\nAll tools are collected before the agent starts. The tool set is fixed for its lifetime; `execute`\n\nis included or omitted at initialization based on the backend type and never toggled after that.\n\nOne caveat: `grep`\n\nhere is literal text search, not regex.\n\nThere is a less obvious part of tool routing that matters a lot in practice. `FilesystemMiddleware`\n\nintercepts every tool result in `wrap_tool_call()`\n\n. If the result text exceeds roughly 20,000 tokens (configurable), the middleware automatically:\n\n- Writes the full content to\n`/large_tool_results/{tool_call_id}`\n\nvia the backend. - Replaces the result in the message with a truncated head+tail preview and a note telling the model to use\n`read_file`\n\nif it needs the rest.\n\nThe model is told this in its system prompt:\n\nWhen a tool result is too large, it may be offloaded into the filesystem instead of being returned inline. In those cases, use\n\n`read_file`\n\nto inspect the saved result in chunks.\n\nThe agent carries state between every node and checkpoint — not just the latest message, but everything the graph needs to continue:\n\n`messages`\n\n: the full transcript — user input, assistant replies, tool calls, tool results.`todos`\n\n: the current todo list.`files`\n\n: filesystem state from`FilesystemMiddleware`\n\n.`async_subagent_jobs`\n\n: background job tracking, if enabled.\n\nSome fields — `memory_contents`\n\n, `skills_metadata`\n\n, `_summarization_event`\n\n— are private and never checkpointed. They reload from the backend each turn, which is why memory middleware re-reads `AGENTS.md`\n\nfresh on every model call instead of restoring it from a snapshot.\n\nThe dominant key by far is `messages`\n\n. Every tool call, result, and reply accumulates there — which is why long-running threads grow heavy over time.\n\nConceptually, a checkpointed state can look like this:\n\n```\n{\n  \"messages\": [\n    {\"type\": \"human\", \"content\": \"Evaluate this prompt\"},\n    {\n      \"type\": \"ai\",\n      \"content\": \"\",\n      \"tool_calls\": [\n        {\"id\": \"call_1\", \"name\": \"read_file\", \"args\": {\"path\": \"/repo/prompt.md\"}},\n        {\"id\": \"call_2\", \"name\": \"ls\", \"args\": {\"path\": \"/repo/.evals\"}}\n      ]\n    },\n    {\n      \"type\": \"tool\",\n      \"name\": \"read_file\",\n      \"tool_call_id\": \"call_1\",\n      \"content\": \"...\"\n    },\n    {\n      \"type\": \"tool\",\n      \"name\": \"ls\",\n      \"tool_call_id\": \"call_2\",\n      \"content\": \"...\"\n    },\n    {\"type\": \"ai\", \"content\": \"I found the prompt and test cases...\"}\n  ],\n  \"todos\": [\n    {\"content\": \"Read prompt\", \"status\": \"completed\"}\n  ],\n  \"files\": {},\n  \"_summarization_event\": null\n}\n```\n\nThe real state is richer than this — it includes provider metadata, channel versions, pending writes, and task identifiers. But this shape captures the important part: the graph is carrying a transcript of both conversation and computation.\n\nEach conversation thread is keyed by `thread_id`\n\nand can be resumed after a restart. Every graph step writes a snapshot — useful for durability, costly because state grows with message history.\n\nOne detail worth knowing: deepagents sets a default recursion limit of 1000 graph steps before raising an error. The CLI in this project overrides it to 100.\n\nAt the very first, when I saw my terminal stream like this:\n\n``` php\n-> read_file\n-> read_file ✓ ✓\n-> ls ✓\n-> read_file ✓\n```\n\nThe flow was really slow — I was tempting to blame the tools.\n\nBut local tools like `ls`\n\nand `read_file`\n\nare usually fast. The latency comes from the model/tool loop around them.\n\nEach batch has overhead:\n\n- another model call;\n- tool result serialization;\n- checkpoint writes;\n- larger message history on the next model request;\n- more routing and middleware work.\n\nThe tool might take 20 ms. The model turn around it might take several seconds. That is the hidden cost of agentic orchestration.\n\nThe bottleneck is not the tools — it is how often the agent returns to the model, how much context each tool returns, and how much state you carry forward. A good harness manages three budgets:\n\n- Model turns.\n- Tool output size.\n- Persistent state growth.\n\nTo reduce unnecessary model/tool loops:\n\n- Encourage batched tool calls when reads are independent.\n- Avoid returning huge raw payloads when a summary or filtered result is enough.\n- Store large artifacts outside\n`messages`\n\nand reference them by path. (`FilesystemMiddleware`\n\nalready does this automatically for results over ~20K tokens, but you still pay for the model turns that read those files back.) - Use a faster orchestrator model — it doesn't need to be the smartest, just effective for your use case.\n\nThe most important one: design tools that return decision-ready output.\n\nThink of Deep Agents as a runtime harness:\n\n```\nLLM reasoning\n  + tool schemas\n  + graph routing\n  + middleware\n  + state channels\n  + checkpointing\n  + summarization\n  + filesystem / backend storage\n```\n\nThe LLM is only one part of the system. The harness decides what the LLM sees, what tools it can call, how results are stored, when control returns to the model, and how the conversation survives over time.\n\nDesigning good agents means designing the loop intentionally: batch tools when possible, keep tool outputs small, manage checkpointed state, and make each model turn count.\n\nThe power comes from the loop.", "url": "https://wpnews.pro/news/a-multi-tool-agent-harness-graph-routing-middleware-and-state-budgets", "canonical_source": "https://github.com/Bella3202019/promptloop/blob/main/docs/The_Harness_Behind_Deep_Agent.md", "published_at": "2026-05-28 16:18:18+00:00", "updated_at": "2026-05-28 16:32:51.703630+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "ai-tools", "ai-infrastructure", "ai-research"], "entities": ["LangChain", "deepagents", "LangGraph", "StateGraph", "Claude Sonnet", "Anthropic"], "alternates": {"html": "https://wpnews.pro/news/a-multi-tool-agent-harness-graph-routing-middleware-and-state-budgets", "markdown": "https://wpnews.pro/news/a-multi-tool-agent-harness-graph-routing-middleware-and-state-budgets.md", "text": "https://wpnews.pro/news/a-multi-tool-agent-harness-graph-routing-middleware-and-state-budgets.txt", "jsonld": "https://wpnews.pro/news/a-multi-tool-agent-harness-graph-routing-middleware-and-state-budgets.jsonld"}}