Nobody Is Measuring What Your AI Agents Are Worth A new open-source tool called agent-panorama converts raw LLM agent traces into plain-English reports for managers, answering whether agents are worth their cost. It works with LangChain and LangGraph apps, requires only a single callback to install, and outputs Markdown, HTML, or JSON reports with value scoring and cost-per-valuable-conversation metrics. Turn raw LLM agent traces into a report a manager can actually read - what your agents did, whether it was worth it, and what it cost. Point it at a Langfuse or LangSmith export or add a one-line live callback and get clean Markdown + a self-contained HTML report - and a local dashboard - in plain business language. It's one line to switch on. An engineer drops a single callback into your existing agents - no rebuild, no new infrastructure, no traces to wire up - and the live dashboard starts filling in. Works in any LangChain or LangGraph app today more frameworks on the roadmap roadmap . Three questions about any agent in production. Your existing tools answer the first two: | Question | Answered by | | |---|---|---| | Does it run? | observability - traces, tokens, latency | | | Is it correct? | evals - scores on a test set | | → | Is it worth it? | agent-panorama | It answers the third - the one your CEO, client, or PM actually asks - across three rungs over the same conversations: Clarity - "what did they do?" One plain-English feed across the fleet or a single agent : asked X → did Y → outcome. A 30-message chat becomes one line. No spans, no JSON. Value - "was it worth it?" An LLM judge scores each conversation against your definition of value your domain, your user goal, your success criteria and reports the value delivered, the value lost, and what to fix. Cost - "what did it cost?" Tokens → dollars → cost per valuable conversation , the ROI number nobody else gives you. The fleet view - one plain-English activity feed across every agent, with per-run details, outcomes, and cost. Define value without YAML - a guided wizard fills in each agent's value ontology as a live constellation map, then a Value Blueprint summarizes it. Traces are great for engineers and terrible for everyone else. agent-panorama translates tool calls, retries, token usage, and errors into plain English. It also pulls the real user request and final answer out of LangGraph/LangChain messages payloads, so the report reads like a story, not a JSON dump: get weather {"city": "Paris"} → "Looked up the weather" - 3 failed model calls → "High retry count: 3 failed attempts before completing." human handoff ... → run outcome human-escalated Tokens are the primary metric. USD cost is opt-in since v0.2 : supply a model prices table in your config and the report adds dollar estimates alongside tokens no prices ⇒ cost stays hidden . pip install agent-panorama or, for local development: uv pip install -e ". dev " Requires Python 3.10+. Dependencies are intentionally minimal: click , jinja2 , pyyaml , python-dotenv . agent-panorama generate --input traces.json --output ./report --format html Options: | Option | Description | |---|---| --input | Path, glob, or directory of JSON exports. Repeatable; globs/dirs are expanded required . | --output | Output directory default ./report . | --format | md , html , json , or both = md+html; default both . | --input-type | langfuse or langsmith default langfuse . | --config | Optional YAML config tool naming, thresholds, model prices . | --detail | Step narrative detail: minimal , standard default , or richer . | --session | Keep only runs matching this session id. | --since / --until | Keep only runs whose start time is within this ISO date/datetime window UTC . | --summarize | Phrase each minimal result via a cheap LLM opt-in, off by default . See below. | --summarize-model | LLM id for --summarize default google genai:gemini-2.5-flash-lite . | Try it on the bundled example, or aggregate a whole fleet: agent-panorama generate --input examples/langfuse traces.json --output ./report many traces → one fleet report + a feed.json for the dashboard agent-panorama generate --input 'traces/ .json' --input more/ \ --since 2026-05-01 --until 2026-05-31 --format json --output ./report Multiple --input flags, glob patterns, and directories are all expanded and aggregated into one report. The report then carries a cross-agent activity feed and per-agent rollups runs, actions, success/escalation/retry rates, tokens, and cost when model prices is set . --format json writes a report.json with a stable contract generated at , time range , totals , feed , rollups , decision log consumed by the frontend dashboard. Add a model prices table to your config to get dollar estimates next to tokens prices are USD per 1M tokens; keys match model names by substring, longest match wins : model prices: gpt-4o-mini: { input: 0.15, output: 0.60 } gpt-4o: { input: 2.50, output: 10.00 } claude-3-5-sonnet: { input: 3.00, output: 15.00 } With no model prices block, cost is omitted entirely and tokens remain the only metric. --format json writes a report.json with a stable shape also the input the dashboard frontend-dashboard consumes . Every timestamp is ISO-8601 UTC or null ; every cost usd is a number or null null when no model prices matched . outcome is one of success , human-escalated , failure , unknown . { "generated at": "2026-05-31T09:42:00+00:00", "time range": { "start": "…", "end": "…" }, "totals": { "runs": 4, "steps": 7, "tokens": 3990, "cost usd": 0.0134, "value": null }, // value summary when the value layer is on "feed": // one entry per run, newest first { "run id": "…", "agent name": "research-assistant", "agent key": "research-assistant", // slug, for stable UI grouping/colour "action": "Searched the web and summarized 3 papers.", "outcome": "success", "timestamp": "…", "retry count": 0, "anomaly count": 0, "tokens": 1234, "cost usd": 0.006, "summary": "…", "facts": "Steps", "5" , "Retries", "0" , "anomalies": , "value": null // ValueJudgment when judged see value layer } , "rollups": // one per agent { "agent name": "research-assistant", "agent key": "research-assistant", "runs": 1, "actions": 5, "success rate": 1.0, "escalation rate": 0.0, "failure rate": 0.0, "retry rate": 0.0, "total tokens": 1234, "total cost usd": 0.006, "judged": 0, "avg value score": null, // value layer metrics null when off "valuable rate": null, "cost per valuable usd": null } , "decision log": // consequential actions across agents { "timestamp": "…", "agent name": "…", "action": "…", "parameters": "…", "outcome": "succeeded" } } python from agent panorama import generate report report = generate report "traces.json", output dir="./report", formats= "md", "html" , input type="langfuse", config="config.yaml", optional print report.total runs, report.total tokens generate report returns the in-memory Report , so you can also inspect runs, the decision log, and anomalies programmatically without touching disk use build report from file if you want the report without writing files . generate report and the lower-level build report from inputs accept a glob, a directory, or a list of paths via inputs= , plus session / since / until filters. The returned Report exposes the cross-agent feed and per-agent rollups ; serialize report gives you the report.json dict directly. python from agent panorama import generate report, build report from inputs, load runs, load config, serialize report, report = generate report inputs= "traces/ .json", "more/" , globs, dirs, or a single path formats= "json" , writes report.json since="2026-05-01", until="2026-05-31", config="config.yaml", model prices here ⇒ cost is populated for item in report.feed: newest-first activity feed print item.agent name, item.action, item.outcome.value, item.tokens, item.cost usd for r in report.rollups: per-agent success/escalation/retry rates print r.agent name, r.runs, r.success rate, r.escalation rate, r.retry rate No files? Build in memory and serialize the JSON contract yourself: runs = load runs "traces/ .json", session="abc123" mem = build report from inputs "traces/ .json", "langfuse", load config "config.yaml" payload = serialize report mem, load config "config.yaml" - dict Summary - time range, total runs, total steps, total tokens and total cost when model prices is set . Fleet activity feed v0.2 - one scannable, newest-first line per run across every agent: who did what, in plain English, with outcome and timing. Per-agent rollups v0.2 - one row per agent: runs, actions, and success / escalation / retry rates, plus tokens and cost. Per-agent section - what it was asked to do, what it did step by step graph nodes / tool calls in plain English, at the chosen --detail level , final outcome, and a confidence signal retries / fallback . Decision log - a sortable table of every consequential action: timestamp, agent, action, parameters summarized in plain English, outcome. Anomalies - high retry counts, slow runs, high activity, errors, fallbacks. All configuration is optional. See config.example.yaml /Idank96/agent-panorama/blob/main/config.example.yaml for the full set. Highlights: tool descriptions: get weather: "Looked up the weather" consequential tools: send email, human handoff escalation tools: human handoff, handoff to agent anomaly thresholds: max retries: 2 max latency seconds: 30 max tool calls: 15 By default the report uses no LLM - it just reformats trace data. But in --detail minimal , a long final answer e.g. a big Markdown table is condensed with a simple heuristic, which keeps the agent's own wording "Here are all the open support tickets" . If you'd rather get a crisp past-tense action line that keeps the identifying details and the bottom-line takeaway " Resolved Acme Corp's billing question - refund issued, ticket closed." , enable the opt-in --summarize flag, which rewrites just the result via a cheap model. It is intentionally tiny: a ~40-token fixed system prompt, at most ~250 input tokens the result is hard-capped at 1,000 characters , and a ~25-token reply - roughly 300 tokens total per run . On a free-tier model this costs nothing; on the cheapest paid model it's a fraction of a cent. - Install a provider extra pick the one matching your model : pip install "agent-panorama gemini " Google Gemini recommended, free tier pip install "agent-panorama openai " OpenAI pip install "agent-panorama anthropic " Anthropic - Get your own API key from the provider and either export it or put it in a .env file in the working directory auto-loaded; real env vars win : export GOOGLE API KEY=... Gemini or OPENAI API KEY / ANTHROPIC API KEY …or a .env file: GOOGLE API KEY=... - Run with --summarize : agent-panorama generate --input traces.json --output ./report \ --detail minimal --summarize pick a different model: agent-panorama generate --input traces.json --output ./report \ --detail minimal --summarize --summarize-model openai:gpt-5-nano If the provider package or key is missing, summarization is skipped gracefully you just get the heuristic line - it never breaks report generation. Every call is logged to