{"slug": "hermes-agent-dashboard", "title": "Hermes Agent Dashboard", "summary": "SigNoz released a new Hermes Agent Dashboard that provides real-time monitoring of Hermes coding agent performance using OpenTelemetry traces. The dashboard tracks agent turn volume, LLM API call patterns, token consumption, tool-call activity, and error trends to help teams optimize agent speed and reliability.", "body_md": "Before using this dashboard, instrument your Hermes agent with OpenTelemetry and configure export to SigNoz. See the [Hermes monitoring guide](https://signoz.io/docs/hermes-monitoring/) for complete setup instructions.\n\nThis dashboard offers a clear view into Hermes coding agent behavior and performance. It highlights key metrics such as agent turn volume, LLM API call patterns, token consumption, tool-call activity, and error trends. Teams can track end-to-end turn latency, per-model token costs, and individual failing spans to keep their agents fast and reliable.\n\nDashboard Preview\n\nDashboards → + New dashboard → Import JSON\n\nWhat This Dashboard Monitors\n\nThis dashboard tracks critical performance metrics for your Hermes coding agent using OpenTelemetry traces (service: `hermes-agent`\n\n) to help you:\n\n**Monitor Agent Activity**: Track agent turn and LLM turn counts, total tool calls, and overall API call volume to understand how actively the agent is working across sessions.**Analyze Token Consumption**: Observe input, output, and cache-read token usage over time and per model to understand costs, spot consumption spikes, and optimize prompting strategies.**Track Model Usage**: See which LLM models are being called, how tokens are distributed across them, and how finish reasons break down to measure model health and behavior.**Ensure Responsiveness**: Monitor end-to-end agent turn latency and LLM API call latency at p50, p95, and p99 to surface slowdowns and maintain a consistent coding experience.**Understand Tool Behavior**: Measure which tools are called most often, how long each tool takes, and whether tool calls succeed or error — including a summary table with call counts and p95 latency per tool.**Investigate Errors**: Track error spans over time by operation, view a ranked table of the most-failing operations, and drill into individual failing spans with status messages for root-cause analysis.\n\nMetrics Included\n\nOverview Scorecards\n\n**Agent Turns**: Count of root`agent`\n\nspans in the selected time range, representing the total number of agent turns or sessions processed.**LLM Turns**: Count of`llm.*`\n\nwrapper spans, showing how many LLM interaction cycles the agent performed.**LLM API Calls**: Count of spans where`llm.model_name`\n\nexists, representing individual chat completion calls made to the model provider.**Tool Calls**: Count of`tool.*`\n\nspans, showing the total number of tool invocations across all agent turns.**Total Tokens**: Sum of`gen_ai.usage.total_tokens`\n\nacross all spans, giving the aggregate token consumption for the selected range.**Error Spans**: Count of spans where`hasError = true`\n\n, with a red threshold triggered by any non-zero value for immediate attention.\n\nLLM & Model Metrics\n\n**LLM API Calls by Model**: Pie chart breaking down chat completion call counts by`llm.model_name`\n\n, helping you understand which models are called most frequently and track adoption across model versions.**Token Usage Over Time**: Time series showing input tokens, output tokens, and cache-read tokens stacked over time, revealing consumption trends and the benefit of prompt caching.**Total Tokens by Model**: Pie chart showing total token consumption split by model, useful for understanding which model drives the most cost.** LLM API Call Latency (p50 / p95 / p99)**: Duration percentiles for chat completion spans over time, surfacing model response time trends and latency regressions.**Cost Proxy: Input vs Output Tokens by Model**: Line chart plotting input and output token volume per model over time as a cost proxy, since no native cost attribute is available — scale by your per-model pricing to estimate spend.**Responses by Finish Reason**: Pie chart of`llm.response.finish_reason`\n\nvalues (e.g.`stop`\n\n,`tool_calls`\n\n,`length`\n\n) to reveal how often the model terminates normally versus hitting limits or requesting tool use.\n\nAgent & Turn Metrics\n\n**Agent Turns Over Time**: Time series of root`agent`\n\nspan counts, showing turn volume trends and helping identify peak activity windows or unexpected drops.**Agent Turn Duration (p50 / p95)**: End-to-end duration percentiles for`agent`\n\nspans, measuring how long complete agent turns take from start to finish.**Avg API Calls per Turn**: Average of`hermes.turn.api_call_count`\n\nper agent span over time, showing how many model round-trips a typical turn requires.**Avg Tools per Turn**: Average of`hermes.turn.tool_count`\n\nper agent span, indicating how tool-heavy the agent's reasoning is on a typical turn.**Turn Final Status**: Pie chart of`hermes.turn.final_status`\n\nvalues, showing the distribution of how agent turns complete (e.g. success, error, timeout).**Sessions by Kind**: Pie chart of`hermes.session.kind`\n\nvalues, breaking down sessions by their interaction type or mode.\n\nTool Call Metrics\n\n**Tool Calls by Type**: Pie chart of`tool.*`\n\nspan counts grouped by operation name, showing which tool types the agent invokes most.**Tool Call Latency (p95) by Type**: Line chart of p95 duration per tool over time, identifying which tools are the slowest and most likely to bottleneck agent turns.**Tool Outcomes (completed vs error)**: Pie chart of`hermes.tool.outcome`\n\nvalues, showing the ratio of successful versus failed tool executions.**GenAI Tool Invocations by Name**: Pie chart of tool call counts grouped by`tool.name`\n\n(model-requested tools), revealing which tools the model chooses most during its reasoning loop.**Tool Usage Summary**: Table showing each tool type with its total call count (sorted descending) and p95 latency, giving a quick reference for the most-used and slowest tools.\n\nError Monitoring\n\n**Errors Over Time**: Time series of`hasError = true`\n\nspans grouped by span name, letting you see which operations are failing and when failure spikes occur.**Error Count by Operation**: Table of error counts per operation name sorted descending, identifying the most-failing span types at a glance.** Recent Error Spans**: List of the 25 most recent errored spans sorted by timestamp, showing the span name, status message,`hermes.tool.outcome`\n\n, and duration — use this to drill into individual failures and find root causes.", "url": "https://wpnews.pro/news/hermes-agent-dashboard", "canonical_source": "https://signoz.io/docs/dashboards/dashboard-templates/hermes-dashboard", "published_at": "2026-06-11 00:00:00+00:00", "updated_at": "2026-06-19 15:39:49.440762+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "ai-infrastructure"], "entities": ["SigNoz", "Hermes", "OpenTelemetry"], "alternates": {"html": "https://wpnews.pro/news/hermes-agent-dashboard", "markdown": "https://wpnews.pro/news/hermes-agent-dashboard.md", "text": "https://wpnews.pro/news/hermes-agent-dashboard.txt", "jsonld": "https://wpnews.pro/news/hermes-agent-dashboard.jsonld"}}