cd /news/developer-tools/a-3-step-agent-cost-me-4-20-agenttra… · home topics developer-tools article
[ARTICLE · art-4267] src=dev.to pub= topic=developer-tools verified=true sentiment=· neutral

A 3-step agent cost me $4.20. agenttrace showed me the O(n ) tool call hiding in plain sight.

A simple three-step AI agent unexpectedly cost $4.20 due to a hidden bug in the cite-check step, where the model made nine tool calls instead of one because each iteration re-attached the full prior history, causing input tokens to grow quadratically. The author used a Rust crate called `agenttrace-rs` to aggregate LLM calls into runs and generate a by-step cost breakdown, which revealed the issue. After fixing the bug by implementing a sliding window instead of re-attaching full history, the same run cost only $0.14—about 30 times cheaper.

read4 min views6 publishedMay 21, 2026

I ran a small agent. Three steps. One web search, one summarize, one cite-check. I had budgeted maybe 12 cents.

The bill at the end of the run was $4.20.

I knew something was off but the per-call invoice line items were not telling me anything useful. They were just a list of messages.create

calls. I needed to group them into the run that produced them and look at the cost shape.

That is the gap agenttrace-rs

fills. It is a Rust crate that aggregates LLM calls into runs and gives you cost, latency, and a by-model breakdown.

The breakdown that surfaced the bug #

use agenttrace::{Trace, Run};

let mut trace = Trace::new();

let run = trace.start_run("cite-check-agent");

run.record_call(claude_cost::estimate(&req1, &resp1));
run.record_call(claude_cost::estimate(&req2, &resp2));
run.record_call(claude_cost::estimate(&req3, &resp3));
// ... and so on for every tool result/follow-up step

let summary = run.finish();
println!("{}", summary.report());

The report it printed for the $4.20 run:

run: cite-check-agent  duration: 38.4s  total_cost_usd: 4.2031
calls: 11
p50_latency_ms: 2710
p95_latency_ms: 4920

by-model:
  claude-opus-4-7:    9 calls  $4.1880  avg_input_tok: 18,420  avg_output_tok: 540
  claude-haiku-4:     2 calls  $0.0151  avg_input_tok: 1,200   avg_output_tok: 180

by-step:
  step_1_search:       1 call   $0.0184  1,800 in   220 out
  step_2_summarize:    1 call   $0.0312  3,100 in   280 out
  step_3_cite_check:   9 calls  $4.1535  avg 22,400 in   avg 510 out

Step 3 was supposed to be one call. It was nine. And the average input tokens were 22,400. That is the smoking gun.

What was actually happening #

The cite-check step had a tool the model could call to fetch a source URL. When the model called the tool, I appended the tool result to the messages list and re-called messages.create

. Standard pattern.

What I missed: every iteration was re-attaching the full prior history including the search results from step 1 and the summary from step 2. So call 4 had everything from calls 1-3 in its input. Call 5 had everything from calls 1-4. And so on. Input tokens grew linearly per call, total tokens grew quadratically over the step.

The model kept calling the tool again because the prompt was structured ambiguously. So I had an unbounded loop hidden behind a 9-iteration tool dance. O(n²) input tokens for n iterations.

The fix was small. I stopped re-attaching the full history on each tool turn and used a sliding window. Re-ran the same run cold:

run: cite-check-agent  duration: 11.2s  total_cost_usd: 0.1432
calls: 5
p50_latency_ms: 2200
p95_latency_ms: 3050

by-model:
  claude-opus-4-7:    3 calls  $0.1290
  claude-haiku-4:     2 calls  $0.0142

by-step:
  step_1_search:      1 call   $0.0181
  step_2_summarize:   1 call   $0.0308
  step_3_cite_check:  3 calls  $0.0943

14 cents. About 30x cheaper. I would not have found the bug without the by-step grouping.

What agenttrace actually does #

use agenttrace::{Trace, Tag};

let mut trace = Trace::new();
let run = trace.start_run("my-agent");

run.tag("user_id", "u_8821");
run.tag("step", "search");

// for each LLM call
run.record(agenttrace::CallRecord {
    model: "claude-opus-4-7".into(),
    input_tokens: 1800,
    output_tokens: 220,
    cache_read_tokens: 0,
    cache_write_tokens: 0,
    latency_ms: 2710,
    cost_usd: 0.0184,
    tags: vec![Tag::step("search")],
});

let summary = run.finish();
trace.append(summary);

// serialize all runs
let json = serde_json::to_string(&trace.runs())?;

It is a thin aggregator. It does not call the API. It does not make pricing decisions. You feed it call records (typically computed from claude-cost

or your own pricing function) and it composes them into a run with cost, p50/p95, and per-tag breakdowns.

Why p95 matters more than mean #

avg_latency_ms

lies. A run with one slow call (the model thought for 12 seconds, the rest returned in 2) shows a mean of about 4 seconds. The p95 shows the actual tail. For agents this is the number that tells you whether your user-facing experience is going to feel snappy or laggy. agenttrace exposes p50, p95, and p99 by default.

Composing with other crates #

claude-cost

for the per-call cost estimate (cache-aware). - cachebench

to see the cache hit ratio across the run. - llm-circuit-breaker

to short-circuit a run when an upstream is degraded so you do not pay $4.20 to discover that.

A typical pipeline in our service looks like: cachebench

records hit/miss → claude-cost

computes cost given hits → agenttrace

aggregates into a run summary.

What this does not solve #

  • It does not store traces durably. Trace

is in-memory. You serialize to disk or to a remote sink yourself. I do that with a one-lineserde_json::to_writer

to a sqlite blob. - It does not visualize. There is no UI. You get JSON or text reports. If you want a flamegraph, pipe to your own viewer.

  • It does not capture the request bodies. Pair with agenttap

for that. agenttrace is the cost/latency layer, not the wire layer. - The tagging system is flat. There is no nested-span model. If you need that, OpenTelemetry is the right tool and otel-genai-bridge-rs

can translate between conventions.

The crate is about 600 lines of pure Rust. No async lock-in.

Repo: https://github.com/MukundaKatta/agenttrace-rs

crates.io: agenttrace = { package = "agenttrace-rs", version = "0.1" }

Part of a small Rust stack I publish for AI agent plumbing: cost, retry, breakers, repair, trace. Built piece by piece from real incidents.

── more in #developer-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/a-3-step-agent-cost-…] indexed:0 read:4min 2026-05-21 ·