{"slug": "nobody-is-measuring-what-your-ai-agents-are-worth", "title": "Nobody Is Measuring What Your AI Agents Are Worth", "summary": "A new open-source tool called agent-panorama converts raw LLM agent traces into plain-English reports for managers, answering whether agents are worth their cost. It works with LangChain and LangGraph apps, requires only a single callback to install, and outputs Markdown, HTML, or JSON reports with value scoring and cost-per-valuable-conversation metrics.", "body_md": "Turn raw LLM agent traces into a report a **manager** can actually read - what your\nagents did, whether it was worth it, and what it cost. Point it at a Langfuse (or\nLangSmith) export (or add a one-line live callback) and get clean Markdown + a\nself-contained HTML report - and a local dashboard - in plain business language.\n\n**It's one line to switch on.** An engineer drops a single callback into your\nexisting agents - no rebuild, no new infrastructure, no traces to wire up - and\nthe live dashboard starts filling in. Works in any LangChain or LangGraph app today\n(more frameworks on the [roadmap](#roadmap)).\n\n**Three questions about any agent in production. Your existing tools answer the\nfirst two:**\n\n| Question | Answered by | |\n|---|---|---|\n| Does it run? | observability - traces, tokens, latency | |\n| Is it correct? | evals - scores on a test set | |\n→ |\nIs it worth it? |\n`agent-panorama` |\n\nIt answers the third - the one your CEO, client, or PM actually asks - across three rungs over the same conversations:\n\n**Clarity - \"what did they do?\"** One plain-English feed across the fleet (or a single agent): asked X → did Y → outcome. A 30-message chat becomes one line. No spans, no JSON.**Value - \"was it worth it?\"** An LLM judge scores each conversation against*your*definition of value (your domain, your user goal, your success criteria) and reports the value delivered, the value lost, and what to fix.**Cost - \"what did it cost?\"** Tokens → dollars →**cost per valuable conversation**, the ROI number nobody else gives you.\n\n*The fleet view - one plain-English activity feed across every agent, with per-run details, outcomes, and cost.*\n\n*Define value without YAML - a guided wizard fills in each agent's value ontology as a live constellation map, then a Value Blueprint summarizes it.*\n\nTraces are great for engineers and terrible for everyone else. `agent-panorama`\n\ntranslates tool calls, retries, token usage, and errors into plain English. It\nalso pulls the real user request and final answer out of LangGraph/LangChain\n`messages`\n\npayloads, so the report reads like a story, not a JSON dump:\n\n`get_weather({\"city\": \"Paris\"})`\n\n→**\"Looked up the weather\"**- 3 failed model calls →\n**\"High retry count: 3 failed attempts before completing.\"** `human_handoff(...)`\n\n→ run outcome**human-escalated**\n\nTokens are the primary metric. **USD cost is opt-in** (since v0.2): supply a\n`model_prices`\n\ntable in your config and the report adds dollar estimates\nalongside tokens (no prices ⇒ cost stays hidden).\n\n```\npip install agent-panorama\n# or, for local development:\nuv pip install -e \".[dev]\"\n```\n\nRequires Python 3.10+. Dependencies are intentionally minimal: `click`\n\n,\n`jinja2`\n\n, `pyyaml`\n\n, `python-dotenv`\n\n.\n\n```\nagent-panorama generate --input traces.json --output ./report --format html\n```\n\nOptions:\n\n| Option | Description |\n|---|---|\n`--input` |\nPath, glob, or directory of JSON exports. Repeatable; globs/dirs are expanded (required). |\n`--output` |\nOutput directory (default `./report` ). |\n`--format` |\n`md` , `html` , `json` , or `both` (= md+html; default `both` ). |\n`--input-type` |\n`langfuse` or `langsmith` (default `langfuse` ). |\n`--config` |\nOptional YAML config (tool naming, thresholds, `model_prices` ). |\n`--detail` |\nStep narrative detail: `minimal` , `standard` (default), or `richer` . |\n`--session` |\nKeep only runs matching this session id. |\n`--since` / `--until` |\nKeep only runs whose start time is within this ISO date/datetime window (UTC). |\n`--summarize` |\nPhrase each `minimal` result via a cheap LLM (opt-in, off by default). See below. |\n`--summarize-model` |\nLLM id for `--summarize` (default `google_genai:gemini-2.5-flash-lite` ). |\n\nTry it on the bundled example, or aggregate a whole fleet:\n\n```\nagent-panorama generate --input examples/langfuse_traces.json --output ./report\n\n# many traces → one fleet report + a feed.json for the dashboard\nagent-panorama generate --input 'traces/*.json' --input more/ \\\n  --since 2026-05-01 --until 2026-05-31 --format json --output ./report\n```\n\nMultiple `--input`\n\nflags, glob patterns, and directories are all expanded and\naggregated into one report. The report then carries a **cross-agent activity\nfeed** and **per-agent rollups** (runs, actions, success/escalation/retry rates,\ntokens, and cost when `model_prices`\n\nis set). `--format json`\n\nwrites a\n`report.json`\n\nwith a stable contract (`generated_at`\n\n, `time_range`\n\n, `totals`\n\n,\n`feed`\n\n, `rollups`\n\n, `decision_log`\n\n) consumed by the frontend dashboard.\n\nAdd a `model_prices`\n\ntable to your config to get dollar estimates next to tokens\n(prices are USD per 1M tokens; keys match model names by substring, longest\nmatch wins):\n\n```\nmodel_prices:\n  gpt-4o-mini: { input: 0.15, output: 0.60 }\n  gpt-4o:      { input: 2.50, output: 10.00 }\n  claude-3-5-sonnet: { input: 3.00, output: 15.00 }\n```\n\nWith no `model_prices`\n\nblock, cost is omitted entirely and tokens remain the\nonly metric.\n\n`--format json`\n\nwrites a `report.json`\n\nwith a stable shape (also the input the\n[dashboard](#frontend-dashboard) consumes). Every timestamp is ISO-8601 UTC or\n`null`\n\n; every `*cost_usd`\n\nis a number or `null`\n\n(null when no `model_prices`\n\nmatched). `outcome`\n\nis one of `success`\n\n, `human-escalated`\n\n, `failure`\n\n, `unknown`\n\n.\n\n```\n{\n  \"generated_at\": \"2026-05-31T09:42:00+00:00\",\n  \"time_range\": { \"start\": \"…\", \"end\": \"…\" },\n  \"totals\":     { \"runs\": 4, \"steps\": 7, \"tokens\": 3990, \"cost_usd\": 0.0134,\n                  \"value\": null },           // value summary when the value layer is on\n  \"feed\": [                                  // one entry per run, newest first\n    {\n      \"run_id\": \"…\", \"agent_name\": \"research-assistant\",\n      \"agent_key\": \"research-assistant\",     // slug, for stable UI grouping/colour\n      \"action\": \"Searched the web and summarized 3 papers.\",\n      \"outcome\": \"success\", \"timestamp\": \"…\",\n      \"retry_count\": 0, \"anomaly_count\": 0,\n      \"tokens\": 1234, \"cost_usd\": 0.006,\n      \"summary\": \"…\", \"facts\": [[\"Steps\", \"5\"], [\"Retries\", \"0\"]],\n      \"anomalies\": [],\n      \"value\": null                          // ValueJudgment when judged (see value layer)\n    }\n  ],\n  \"rollups\": [                               // one per agent\n    {\n      \"agent_name\": \"research-assistant\", \"agent_key\": \"research-assistant\",\n      \"runs\": 1, \"actions\": 5,\n      \"success_rate\": 1.0, \"escalation_rate\": 0.0,\n      \"failure_rate\": 0.0, \"retry_rate\": 0.0,\n      \"total_tokens\": 1234, \"total_cost_usd\": 0.006,\n      \"judged\": 0, \"avg_value_score\": null,  // value layer metrics (null when off)\n      \"valuable_rate\": null, \"cost_per_valuable_usd\": null\n    }\n  ],\n  \"decision_log\": [                          // consequential actions across agents\n    { \"timestamp\": \"…\", \"agent_name\": \"…\", \"action\": \"…\",\n      \"parameters\": \"…\", \"outcome\": \"succeeded\" }\n  ]\n}\npython\nfrom agent_panorama import generate_report\n\nreport = generate_report(\n    \"traces.json\",\n    output_dir=\"./report\",\n    formats=[\"md\", \"html\"],\n    input_type=\"langfuse\",\n    config=\"config.yaml\",  # optional\n)\n\nprint(report.total_runs, report.total_tokens)\n```\n\n`generate_report`\n\nreturns the in-memory `Report`\n\n, so you can also inspect runs,\nthe decision log, and anomalies programmatically without touching disk (use\n`build_report_from_file`\n\nif you want the report without writing files).\n\n`generate_report`\n\n(and the lower-level `build_report_from_inputs`\n\n) accept a glob,\na directory, or a list of paths via `inputs=`\n\n, plus `session`\n\n/ `since`\n\n/ `until`\n\nfilters. The returned `Report`\n\nexposes the cross-agent `feed`\n\nand per-agent\n`rollups`\n\n; `serialize_report`\n\ngives you the `report.json`\n\ndict directly.\n\n``` python\nfrom agent_panorama import (\n    generate_report, build_report_from_inputs, load_runs,\n    load_config, serialize_report,\n)\n\nreport = generate_report(\n    inputs=[\"traces/*.json\", \"more/\"],   # globs, dirs, or a single path\n    formats=[\"json\"],                    # writes report.json\n    since=\"2026-05-01\", until=\"2026-05-31\",\n    config=\"config.yaml\",                # model_prices here ⇒ cost is populated\n)\n\nfor item in report.feed:                 # newest-first activity feed\n    print(item.agent_name, item.action, item.outcome.value, item.tokens, item.cost_usd)\n\nfor r in report.rollups:                 # per-agent success/escalation/retry rates\n    print(r.agent_name, r.runs, r.success_rate, r.escalation_rate, r.retry_rate)\n\n# No files? Build in memory and serialize the JSON contract yourself:\nruns = load_runs(\"traces/*.json\", session=\"abc123\")\nmem = build_report_from_inputs(\"traces/*.json\", \"langfuse\", load_config(\"config.yaml\"))\npayload = serialize_report(mem, load_config(\"config.yaml\"))  # -> dict\n```\n\n**Summary**- time range, total runs, total steps, total tokens (and total cost when`model_prices`\n\nis set).**Fleet activity feed***(v0.2)*- one scannable, newest-first line per run across every agent: who did what, in plain English, with outcome and timing.**Per-agent rollups***(v0.2)*- one row per agent: runs, actions, and success / escalation / retry rates, plus tokens and cost.**Per-agent section**- what it was asked to do, what it did step by step (graph nodes / tool calls in plain English, at the chosen`--detail`\n\nlevel), final outcome, and a confidence signal (retries / fallback).**Decision log**- a sortable table of every consequential action: timestamp, agent, action, parameters summarized in plain English, outcome.**Anomalies**- high retry counts, slow runs, high activity, errors, fallbacks.\n\nAll configuration is optional. See [ config.example.yaml](/Idank96/agent-panorama/blob/main/config.example.yaml)\nfor the full set. Highlights:\n\n```\ntool_descriptions:\n  get_weather: \"Looked up the weather\"\n\nconsequential_tools: [send_email, human_handoff]\nescalation_tools: [human_handoff, handoff_to_agent]\n\nanomaly_thresholds:\n  max_retries: 2\n  max_latency_seconds: 30\n  max_tool_calls: 15\n```\n\nBy default the report uses **no LLM** - it just reformats trace data. But in\n`--detail minimal`\n\n, a long final answer (e.g. a big Markdown table) is condensed\nwith a simple heuristic, which keeps the agent's own wording (\"Here are all the\nopen support tickets\"). If you'd rather get a crisp past-tense action\nline that keeps the identifying details and the bottom-line takeaway\n(\"**Resolved** Acme Corp's billing question - refund issued, ticket closed.\"),\nenable the opt-in `--summarize`\n\nflag, which rewrites just the\nresult via a cheap model.\n\nIt is intentionally tiny: a ~40-token fixed system prompt, **at most ~250 input\ntokens** (the result is hard-capped at 1,000 characters), and a ~25-token reply\n\n- roughly\n**300 tokens total per run**. On a free-tier model this costs nothing; on the cheapest paid model it's a fraction of a cent.\n\n-\nInstall a provider extra (pick the one matching your model):\n\n```\npip install \"agent-panorama[gemini]\"     # Google Gemini (recommended, free tier)\npip install \"agent-panorama[openai]\"     # OpenAI\npip install \"agent-panorama[anthropic]\"  # Anthropic\n```\n\n-\nGet your own API key from the provider and either\n\n`export`\n\nit or put it in a`.env`\n\nfile in the working directory (auto-loaded; real env vars win):\n\n```\nexport GOOGLE_API_KEY=...      # Gemini\n# or OPENAI_API_KEY / ANTHROPIC_API_KEY\n# …or a .env file:  GOOGLE_API_KEY=...\n```\n\n-\nRun with\n\n`--summarize`\n\n:\n\n```\nagent-panorama generate --input traces.json --output ./report \\\n  --detail minimal --summarize\n# pick a different model:\nagent-panorama generate --input traces.json --output ./report \\\n  --detail minimal --summarize --summarize-model openai:gpt-5-nano\n```\n\nIf the provider package or key is missing, summarization is skipped gracefully (you just get the heuristic line) - it never breaks report generation.\n\nEvery call is logged to ** <output>/llm_calls.log** - the exact system prompt,\nthe input sent (with its character count), and the output (or error) for each\nrun - so you can audit precisely what went to the model.\n\nFor this tiny one-shot call any of these is more than capable, so free-tier\naccess and price dominate. **Only Gemini Flash / Flash-Lite have a genuine\nno-credit-card free tier**; OpenAI/Anthropic require a positive balance.\n\nModel (`--summarize-model` ) |\nPrice /1M (in → out) | Free tier | Provider extra | API key env var |\n|---|---|---|---|---|\n`google_genai:gemini-2.5-flash-lite` (default) |\n$0.10 → $0.40 | ✅ free, no card (~1,500 req/day) | `gemini` |\n`GOOGLE_API_KEY` |\n`google_genai:gemini-2.5-flash` |\n$0.30 → $2.50 | ✅ free tier (lower quota) | `gemini` |\n`GOOGLE_API_KEY` |\n`openai:gpt-5-nano` |\n$0.05 → $0.40 | `openai` |\n`OPENAI_API_KEY` |\n|\n`openai:gpt-4.1-nano` |\n$0.10 → $0.40 | `openai` |\n`OPENAI_API_KEY` |\n|\n`openai:gpt-4o-mini` |\n$0.15 → $0.60 | `openai` |\n`OPENAI_API_KEY` |\n|\n`anthropic:claude-haiku-4-5` |\n$1.00 → $5.00 | `anthropic` |\n`ANTHROPIC_API_KEY` |\n\n**Pick google_genai:gemini-2.5-flash-lite** (the default) to run this for free.\n\n`gpt-5-nano`\n\nhas the lowest paid input price if you already use OpenAI.\n*Prices verified May 2026 against providers' official pricing pages; check them for current rates.*\n\n**Langfuse** trace exports - a single trace dict, the single-trace`{\"trace\": {...}, \"observations\": [...]}`\n\nshape, a list of traces, or the`{\"data\": [...]}`\n\nlist-API shape. Tool calls are read from`TOOL`\n\nobservations (falling back to tool spans), and from`toolCalls`\n\n/ OpenAI-style`tool_calls`\n\ndeclared on generations.**LangSmith** run exports - a flat list (or`{\"runs\": [...]}`\n\n) of run nodes; each root run is flattened into one agent run.\n\nToken usage is read from the trace (`inputUsage`\n\n/`outputUsage`\n\nor\n`usage`\n\n/`usage_metadata`\n\n). Dollar cost is opt-in via a `model_prices`\n\nconfig\ntable (see [USD cost](#usd-cost-opt-in)).\n\nA manager-facing **Agent Panorama** dashboard lives in [ frontend/](/Idank96/agent-panorama/blob/main/frontend)\n(Vite + React + TypeScript, outside the Python package). It renders the\n\n`report.json`\n\nproduced by `--format json`\n\n, falling back to bundled demo data\nwhen no JSON is present.See `frontend/README.md`\n\nfor setup; in short:\n\n```\nagent-panorama generate --input 'traces/*.json' --format json --output ./report\ncp report/report.json frontend/public/feed.json\ncd frontend && npm install && npm run dev\n```\n\nWatch your agents **live** instead of from after-the-fact exports. One line in\nany LangChain / LangGraph app streams every completed run to a local dashboard:\n\n``` python\nfrom agent_panorama.live import PanoramaCallbackHandler\n\nagent.invoke(inputs, config={\"callbacks\": [PanoramaCallbackHandler()]})\n```\n\nThen run the dashboard server (one-time install of the `live`\n\nextra):\n\n```\npip install 'agent-panorama[live]'\nagent-panorama serve --open        # dashboard at http://localhost:8321\n```\n\nEach run appears in the activity feed within seconds of finishing - outcome,\ntool calls, tokens, anomalies, and per-agent rollups all update live (the\ndashboard polls `/api/report`\n\nevery 3 s).\n\n**Designed to be safe in the instrumented app:**\n\n- The handler ships with the base package and posts runs over the standard library - your agent app never needs the server dependencies.\n- Delivery never raises and never blocks beyond a 2 s timeout: if the dashboard is down, the app logs one warning and keeps working.\n- The server keeps runs in memory (\n`--max-runs`\n\ncaps retention) and applies the same analysis as batch reports, so outcomes/anomalies match`generate`\n\n.\n\nUseful flags: `--port`\n\n, `--host`\n\n, `--config your.yaml`\n\n(same YAML as\n`generate`\n\n- tool descriptions, escalation tools, model prices), `--max-runs`\n\n.\nPoint the handler elsewhere with `PanoramaCallbackHandler(endpoint=...)`\n\nor the\n`AGENT_PANORAMA_ENDPOINT`\n\nenv var.\n\nA chat agent answering 4 questions is still doing *one thing* for *one user* -\nso the feed aggregates by **(session, actor)**. Pass them in the invoke config\n(LangGraph's `thread_id`\n\nworks automatically):\n\n```\nagent.invoke(inputs, config={\n    \"callbacks\": [PanoramaCallbackHandler()],\n    \"metadata\": {\"session_id\": \"support-42\", \"user_id\": \"user-7\"},\n})\n```\n\nAll turns of that pair collapse into a single feed entry with an\n`Interactions: 4 · 3 ok · 1 failed`\n\nbreakdown, the worst turn's outcome as the\nstatus, and summed tokens/cost. An LLM layer then phrases the whole session in\none line - keeping the identifying details and the outcome, e.g. *\"Worked\nthrough Acme Corp's onboarding - integration is live, handed back to their\nteam.\"* - using the same cheap model as `--summarize`\n\n(install a provider extra such as\n`agent-panorama[gemini]`\n\nand set its API key; without one, a deterministic\nsummary line is shown instead). Override the model with\n`serve --summarize-model ...`\n\n. Batch reports (`generate`\n\n) aggregate the same\nway - Langfuse's native `sessionId`\n\n/`userId`\n\nare picked up automatically.\nRuns without a session id stay one-entry-per-run.\n\nTry it without LangChain: start `agent-panorama serve --open`\n\n, then run\n`python examples/live_demo.py`\n\nto stream three synthetic runs into the\ndashboard. More demos live in [ examples/](/Idank96/agent-panorama/blob/main/examples), organized by\ncomplexity (\n\n`one_step/`\n\n, `two_step/`\n\n, `multi_step/`\n\n) - including a real\nLangChain example in `examples/one_step/langchain_agent.py`\n\n.The activity feed tells you what your agents *did*. The value layer tells you\nwhether it *mattered* - judged against **your** definition of value, not a\ngeneric rubric. An LLM judge reads each conversation (batch exports and live\nmode alike) and produces a `ValueJudgment`\n\n: scores 0-10, the outcome in your\ndomain language, the concrete moments value was delivered or lost, actionable\nfixes, and a pass/fail verdict per success criterion.\n\nEnable it by adding a `value:`\n\nblock to your YAML config (no new install - it\nuses the same provider extra and API key as `--summarize`\n\n):\n\n```\nvalue:\n  judge_model: google_genai:gemini-2.5-flash   # default; any init_chat_model id\n  max_judgments: 50            # hard cap per report - the cost guard\n  include_single_runs: true    # false = judge only multi-turn sessions\n  default:                     # your definition of value (the generic fallback)\n    domain: customer support\n    user_goal: resolve the user's issue without human escalation\n    success_criteria:\n      - issue resolved in the conversation\n    custom_dimensions:\n      self_service: Did the user finish without needing a human?\n  contexts:                    # per-agent overrides, keyed by agent_key\n    kb-assistant:\n      domain: customer support\n      user_goal: the user resolves their issue\n```\n\nA fleet rarely has one goal, so contexts are **per agent**: each agent's entry\nmerges field-wise over `default`\n\n. With `model_prices`\n\nalso configured, every\nagent gets the number managers actually want - **cost per valuable\nconversation** (total spend ÷ conversations scoring ≥ 6).\n\nIn the dashboard this appears as a second **Value** view (it shows up in the\nsidebar only when something was judged): fleet averages, a per-agent value\ntable, and conversations sorted lowest-value first - because the manager's job\nis finding lost value. Judged feed cards carry a score pill, and the detail\npanel shows the full verdict.\n\nCost notes: each judgment is one capped LLM call (transcript hard-capped at\n~8k chars); `max_judgments`\n\nbounds batch reports, and live mode caches one\njudgment per conversation, re-judging only when a new turn arrives. Every call\nis audited to `llm_calls.log`\n\n. Without a provider/key, judging degrades\nsilently - the report still generates, just unjudged.\n\nManagers don't have to hand-write the `value:`\n\nblock. The live dashboard has a\n**Value Ontology** section that builds it with them:\n\n- A\n**guided wizard** asks one plain-language question at a time - who the agent serves, the user's goal, what success looks like, how it fails, what's at stake - while a live constellation map fills in as they answer. \"Help me figure out\" proposes domain-specific examples (LLM-phrased with a provider key; plain deterministic questions without one). - On finish, each agent gets a\n**Value Blueprint**: a one-glance briefing - an executive summary, a completeness score, the ontology snapshot (click to expand), a plain-language \"how value is created\" narrative, and success-criteria / value-dimension / failure-mode / stakes cards, plus a fleet comparison. Switch between agents with the top pills, re-open the wizard to edit, or define a new agent's ontology from scratch.\n\nDefinitions are saved by `agent-panorama serve`\n\nto a sidecar in `--data-dir`\n\nand\n**override the YAML value: block**, so the judge re-maps and re-judges with the\nmanager's own words.\n\n`agent-panorama`\n\nstarts as a report generator and is growing into an **oversight\nlayer for fleets of agents** - a single pane of glass for everything your agents\ndid, decided, and got wrong. More than logs, across more than one agent.\n\n**✅ v0.1 - Read one run clearly (today)**\n\n- Langfuse + LangSmith trace ingestion\n- Plain-language per-agent summaries, decision log, anomalies\n- Markdown + self-contained HTML output; CLI and library API\n\n**✅ v0.2 - See the whole fleet (the panorama view)**\n\n-\nA unified\n\n**cross-agent activity feed**- one scannable timeline of what every agent did, in plain English:\n\n```\nAgent Activity - May 28, 14:30-15:00\n\nresearch-assistant    → searched the web, summarized 3 papers            ✓ success\nscheduling-assistant  → checked the calendar, handed the task to a human ⤴ escalated\nweather-assistant     → looked up the weather (retried once), emailed it ✓ success\nbilling-agent         → issued 2 refunds, flagged 1 for review           ⚠ anomaly\n```\n\n-\nAggregate many traces into one report (by session, time window, or file glob)\n\n-\nPer-agent rollups: runs, actions, success / escalation / retry rates\n\n-\nCross-agent decision log spanning every agent in the window\n\n**✅ v0.3 - Continuous oversight: the live dashboard**\n\n- One-line LangChain/LangGraph integration (\n`PanoramaCallbackHandler`\n\n) `agent-panorama serve`\n\n- a local server with the dashboard bundled in- Runs stream in as they finish; feed, rollups, and totals update live\n\n**✅ v0.4 - The value layer: was it worth it?**\n\n- LLM-as-judge scores every conversation against\n*your*value definition (domain, user goal, success criteria, custom dimensions - per agent) - Value delivered / value lost / recommended fixes, cited from the transcript\n- A second dashboard view: avg value score, valuable rate, and\n**cost per valuable conversation** - A\n**Value Ontology** builder in the dashboard: a guided wizard plus a per-agent**Value Blueprint** so managers define value without touching YAML\n\n**📈 v0.5 - Trends & regressions**\n\n- Track rates over time, not just a point-in-time snapshot\n- Flag regressions (escalations or retries spiking vs. a baseline)\n- Period-over-period comparison (\"this week vs. last\")\n\n**🔌 v0.6 - More frameworks & sources**\n\n- One-line callbacks/adapters for more agent frameworks - CrewAI, AutoGen / AG2, the OpenAI Agents SDK, AWS Strands, and more (today: LangChain / LangGraph)\n- OpenTelemetry / OpenInference and raw OpenAI-style logs\n- Optionally fetch full input/output from the Langfuse API to enrich decision-log parameters\n- Pluggable parser interface for custom trace formats\n\n**🎯 The vision - Full continuous oversight**\n\n- In-flight runs on the live dashboard (watch a run while it's still working)\n- Scheduled/continuous reports instead of one-off runs\n- Accountability views a non-engineer can sign off on (what happened, what needs a human)\n- Alerting on anomalies across the fleet\n\nHave a use case or a trace format you want supported? Open an issue.\n\n```\nuv pip install -e \".[dev]\"\npython tests/run_all_tests.py     # run the full suite\nruff check . && ruff format --check .\n```\n\nContributions are very welcome - and kept deliberately easy. No CLA, no strict process, no style police. If you use agents and want better reports, jump in.\n\n**Good first things to do:**\n\n- Add a parser for a trace format you use (see the registry in\n`parsers/__init__.py`\n\n- write`parse(payload) -> list[AgentRun]`\n\nand register it; nothing downstream changes). - Improve a plain-language summary, fix a parsing edge case, or polish the report.\n- Open an issue with a (scrubbed) trace that doesn't render well - that alone helps a lot.\n\n**The whole flow:**\n\n- Fork & branch.\n- Make your change. Run\n`ruff check . && ruff format .`\n\nand`python tests/run_all_tests.py`\n\n(a green suite is all that's expected - add a test if it makes sense, but don't sweat it). - Open a PR. Rough is fine - we'll iterate together.\n\nQuestions, ideas, half-finished patches: all welcome. Star the repo, open an issue, or just say hi. 🙌\n\nMIT - see [LICENSE](/Idank96/agent-panorama/blob/main/LICENSE).", "url": "https://wpnews.pro/news/nobody-is-measuring-what-your-ai-agents-are-worth", "canonical_source": "https://github.com/Idank96/agent-panorama", "published_at": "2026-06-14 18:50:29+00:00", "updated_at": "2026-06-14 19:11:54.175125+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "developer-tools", "ai-products"], "entities": ["agent-panorama", "Langfuse", "LangSmith", "LangChain", "LangGraph", "Google", "Gemini"], "alternates": {"html": "https://wpnews.pro/news/nobody-is-measuring-what-your-ai-agents-are-worth", "markdown": "https://wpnews.pro/news/nobody-is-measuring-what-your-ai-agents-are-worth.md", "text": "https://wpnews.pro/news/nobody-is-measuring-what-your-ai-agents-are-worth.txt", "jsonld": "https://wpnews.pro/news/nobody-is-measuring-what-your-ai-agents-are-worth.jsonld"}}