{"slug": "a-3-step-agent-cost-me-4-20-agenttrace-showed-me-the-o-n-tool-call-hiding-in", "title": "A 3-step agent cost me $4.20. agenttrace showed me the O(n ) tool call hiding in plain sight.", "summary": "A simple three-step AI agent unexpectedly cost $4.20 due to a hidden bug in the cite-check step, where the model made nine tool calls instead of one because each iteration re-attached the full prior history, causing input tokens to grow quadratically. The author used a Rust crate called `agenttrace-rs` to aggregate LLM calls into runs and generate a by-step cost breakdown, which revealed the issue. After fixing the bug by implementing a sliding window instead of re-attaching full history, the same run cost only $0.14—about 30 times cheaper.", "body_md": "I ran a small agent. Three steps. One web search, one summarize, one cite-check. I had budgeted maybe 12 cents.\n\nThe bill at the end of the run was $4.20.\n\nI knew something was off but the per-call invoice line items were not telling me anything useful. They were just a list of `messages.create`\n\ncalls. I needed to group them into the run that produced them and look at the cost shape.\n\nThat is the gap `agenttrace-rs`\n\nfills. It is a Rust crate that aggregates LLM calls into runs and gives you cost, latency, and a by-model breakdown.\n\n## The breakdown that surfaced the bug\n\n``` js\nuse agenttrace::{Trace, Run};\n\nlet mut trace = Trace::new();\n\nlet run = trace.start_run(\"cite-check-agent\");\n\nrun.record_call(claude_cost::estimate(&req1, &resp1));\nrun.record_call(claude_cost::estimate(&req2, &resp2));\nrun.record_call(claude_cost::estimate(&req3, &resp3));\n// ... and so on for every tool result/follow-up step\n\nlet summary = run.finish();\nprintln!(\"{}\", summary.report());\n```\n\nThe report it printed for the $4.20 run:\n\n```\nrun: cite-check-agent  duration: 38.4s  total_cost_usd: 4.2031\ncalls: 11\np50_latency_ms: 2710\np95_latency_ms: 4920\n\nby-model:\n  claude-opus-4-7:    9 calls  $4.1880  avg_input_tok: 18,420  avg_output_tok: 540\n  claude-haiku-4:     2 calls  $0.0151  avg_input_tok: 1,200   avg_output_tok: 180\n\nby-step:\n  step_1_search:       1 call   $0.0184  1,800 in   220 out\n  step_2_summarize:    1 call   $0.0312  3,100 in   280 out\n  step_3_cite_check:   9 calls  $4.1535  avg 22,400 in   avg 510 out\n```\n\nStep 3 was supposed to be one call. It was nine. And the average input tokens were 22,400. That is the smoking gun.\n\n## What was actually happening\n\nThe cite-check step had a tool the model could call to fetch a source URL. When the model called the tool, I appended the tool result to the messages list and re-called `messages.create`\n\n. Standard pattern.\n\nWhat I missed: every iteration was re-attaching the full prior history including the search results from step 1 and the summary from step 2. So call 4 had everything from calls 1-3 in its input. Call 5 had everything from calls 1-4. And so on. Input tokens grew linearly per call, total tokens grew quadratically over the step.\n\nThe model kept calling the tool again because the prompt was structured ambiguously. So I had an unbounded loop hidden behind a 9-iteration tool dance. O(n²) input tokens for n iterations.\n\nThe fix was small. I stopped re-attaching the full history on each tool turn and used a sliding window. Re-ran the same run cold:\n\n```\nrun: cite-check-agent  duration: 11.2s  total_cost_usd: 0.1432\ncalls: 5\np50_latency_ms: 2200\np95_latency_ms: 3050\n\nby-model:\n  claude-opus-4-7:    3 calls  $0.1290\n  claude-haiku-4:     2 calls  $0.0142\n\nby-step:\n  step_1_search:      1 call   $0.0181\n  step_2_summarize:   1 call   $0.0308\n  step_3_cite_check:  3 calls  $0.0943\n```\n\n14 cents. About 30x cheaper. I would not have found the bug without the by-step grouping.\n\n## What agenttrace actually does\n\n``` js\nuse agenttrace::{Trace, Tag};\n\nlet mut trace = Trace::new();\nlet run = trace.start_run(\"my-agent\");\n\nrun.tag(\"user_id\", \"u_8821\");\nrun.tag(\"step\", \"search\");\n\n// for each LLM call\nrun.record(agenttrace::CallRecord {\n    model: \"claude-opus-4-7\".into(),\n    input_tokens: 1800,\n    output_tokens: 220,\n    cache_read_tokens: 0,\n    cache_write_tokens: 0,\n    latency_ms: 2710,\n    cost_usd: 0.0184,\n    tags: vec![Tag::step(\"search\")],\n});\n\nlet summary = run.finish();\ntrace.append(summary);\n\n// serialize all runs\nlet json = serde_json::to_string(&trace.runs())?;\n```\n\nIt is a thin aggregator. It does not call the API. It does not make pricing decisions. You feed it call records (typically computed from `claude-cost`\n\nor your own pricing function) and it composes them into a run with cost, p50/p95, and per-tag breakdowns.\n\n## Why p95 matters more than mean\n\n`avg_latency_ms`\n\nlies. A run with one slow call (the model thought for 12 seconds, the rest returned in 2) shows a mean of about 4 seconds. The p95 shows the actual tail. For agents this is the number that tells you whether your user-facing experience is going to feel snappy or laggy. agenttrace exposes p50, p95, and p99 by default.\n\n## Composing with other crates\n\n-\n`claude-cost`\n\nfor the per-call cost estimate (cache-aware). -\n`cachebench`\n\nto see the cache hit ratio across the run. -\n`llm-circuit-breaker`\n\nto short-circuit a run when an upstream is degraded so you do not pay $4.20 to discover that.\n\nA typical pipeline in our service looks like: `cachebench`\n\nrecords hit/miss → `claude-cost`\n\ncomputes cost given hits → `agenttrace`\n\naggregates into a run summary.\n\n## What this does not solve\n\n- It does not store traces durably.\n`Trace`\n\nis in-memory. You serialize to disk or to a remote sink yourself. I do that with a one-line`serde_json::to_writer`\n\nto a sqlite blob. - It does not visualize. There is no UI. You get JSON or text reports. If you want a flamegraph, pipe to your own viewer.\n- It does not capture the request bodies. Pair with\n`agenttap`\n\nfor that. agenttrace is the cost/latency layer, not the wire layer. - The tagging system is flat. There is no nested-span model. If you need that, OpenTelemetry is the right tool and\n`otel-genai-bridge-rs`\n\ncan translate between conventions.\n\nThe crate is about 600 lines of pure Rust. No async lock-in.\n\nRepo: [https://github.com/MukundaKatta/agenttrace-rs](https://github.com/MukundaKatta/agenttrace-rs)\n\ncrates.io: `agenttrace = { package = \"agenttrace-rs\", version = \"0.1\" }`\n\nPart of a small Rust stack I publish for AI agent plumbing: cost, retry, breakers, repair, trace. Built piece by piece from real incidents.", "url": "https://wpnews.pro/news/a-3-step-agent-cost-me-4-20-agenttrace-showed-me-the-o-n-tool-call-hiding-in", "canonical_source": "https://dev.to/mukundakatta/a-3-step-agent-cost-me-420-agenttrace-showed-me-the-on-tool-call-hiding-in-plain-sight-3omp", "published_at": "2026-05-21 01:52:32+00:00", "updated_at": "2026-05-21 02:01:33.529789+00:00", "lang": "en", "topics": ["developer-tools", "large-language-models", "artificial-intelligence"], "entities": ["agenttrace-rs", "Claude", "agenttrace"], "alternates": {"html": "https://wpnews.pro/news/a-3-step-agent-cost-me-4-20-agenttrace-showed-me-the-o-n-tool-call-hiding-in", "markdown": "https://wpnews.pro/news/a-3-step-agent-cost-me-4-20-agenttrace-showed-me-the-o-n-tool-call-hiding-in.md", "text": "https://wpnews.pro/news/a-3-step-agent-cost-me-4-20-agenttrace-showed-me-the-o-n-tool-call-hiding-in.txt", "jsonld": "https://wpnews.pro/news/a-3-step-agent-cost-me-4-20-agenttrace-showed-me-the-o-n-tool-call-hiding-in.jsonld"}}