Hermes Agent Dashboard

SigNoz released a new Hermes Agent Dashboard that provides real-time monitoring of Hermes coding agent performance using OpenTelemetry traces. The dashboard tracks agent turn volume, LLM API call patterns, token consumption, tool-call activity, and error trends to help teams optimize agent speed and reliability.

Before using this dashboard, instrument your Hermes agent with OpenTelemetry and configure export to SigNoz. See the Hermes monitoring guide https://signoz.io/docs/hermes-monitoring/ for complete setup instructions. This dashboard offers a clear view into Hermes coding agent behavior and performance. It highlights key metrics such as agent turn volume, LLM API call patterns, token consumption, tool-call activity, and error trends. Teams can track end-to-end turn latency, per-model token costs, and individual failing spans to keep their agents fast and reliable. Dashboard Preview Dashboards → + New dashboard → Import JSON What This Dashboard Monitors This dashboard tracks critical performance metrics for your Hermes coding agent using OpenTelemetry traces service: hermes-agent to help you: Monitor Agent Activity : Track agent turn and LLM turn counts, total tool calls, and overall API call volume to understand how actively the agent is working across sessions. Analyze Token Consumption : Observe input, output, and cache-read token usage over time and per model to understand costs, spot consumption spikes, and optimize prompting strategies. Track Model Usage : See which LLM models are being called, how tokens are distributed across them, and how finish reasons break down to measure model health and behavior. Ensure Responsiveness : Monitor end-to-end agent turn latency and LLM API call latency at p50, p95, and p99 to surface slowdowns and maintain a consistent coding experience. Understand Tool Behavior : Measure which tools are called most often, how long each tool takes, and whether tool calls succeed or error — including a summary table with call counts and p95 latency per tool. Investigate Errors : Track error spans over time by operation, view a ranked table of the most-failing operations, and drill into individual failing spans with status messages for root-cause analysis. Metrics Included Overview Scorecards Agent Turns : Count of root agent spans in the selected time range, representing the total number of agent turns or sessions processed. LLM Turns : Count of llm. wrapper spans, showing how many LLM interaction cycles the agent performed. LLM API Calls : Count of spans where llm.model name exists, representing individual chat completion calls made to the model provider. Tool Calls : Count of tool. spans, showing the total number of tool invocations across all agent turns. Total Tokens : Sum of gen ai.usage.total tokens across all spans, giving the aggregate token consumption for the selected range. Error Spans : Count of spans where hasError = true , with a red threshold triggered by any non-zero value for immediate attention. LLM & Model Metrics LLM API Calls by Model : Pie chart breaking down chat completion call counts by llm.model name , helping you understand which models are called most frequently and track adoption across model versions. Token Usage Over Time : Time series showing input tokens, output tokens, and cache-read tokens stacked over time, revealing consumption trends and the benefit of prompt caching. Total Tokens by Model : Pie chart showing total token consumption split by model, useful for understanding which model drives the most cost. LLM API Call Latency p50 / p95 / p99 : Duration percentiles for chat completion spans over time, surfacing model response time trends and latency regressions. Cost Proxy: Input vs Output Tokens by Model : Line chart plotting input and output token volume per model over time as a cost proxy, since no native cost attribute is available — scale by your per-model pricing to estimate spend. Responses by Finish Reason : Pie chart of llm.response.finish reason values e.g. stop , tool calls , length to reveal how often the model terminates normally versus hitting limits or requesting tool use. Agent & Turn Metrics Agent Turns Over Time : Time series of root agent span counts, showing turn volume trends and helping identify peak activity windows or unexpected drops. Agent Turn Duration p50 / p95 : End-to-end duration percentiles for agent spans, measuring how long complete agent turns take from start to finish. Avg API Calls per Turn : Average of hermes.turn.api call count per agent span over time, showing how many model round-trips a typical turn requires. Avg Tools per Turn : Average of hermes.turn.tool count per agent span, indicating how tool-heavy the agent's reasoning is on a typical turn. Turn Final Status : Pie chart of hermes.turn.final status values, showing the distribution of how agent turns complete e.g. success, error, timeout . Sessions by Kind : Pie chart of hermes.session.kind values, breaking down sessions by their interaction type or mode. Tool Call Metrics Tool Calls by Type : Pie chart of tool. span counts grouped by operation name, showing which tool types the agent invokes most. Tool Call Latency p95 by Type : Line chart of p95 duration per tool over time, identifying which tools are the slowest and most likely to bottleneck agent turns. Tool Outcomes completed vs error : Pie chart of hermes.tool.outcome values, showing the ratio of successful versus failed tool executions. GenAI Tool Invocations by Name : Pie chart of tool call counts grouped by tool.name model-requested tools , revealing which tools the model chooses most during its reasoning loop. Tool Usage Summary : Table showing each tool type with its total call count sorted descending and p95 latency, giving a quick reference for the most-used and slowest tools. Error Monitoring Errors Over Time : Time series of hasError = true spans grouped by span name, letting you see which operations are failing and when failure spikes occur. Error Count by Operation : Table of error counts per operation name sorted descending, identifying the most-failing span types at a glance. Recent Error Spans : List of the 25 most recent errored spans sorted by timestamp, showing the span name, status message, hermes.tool.outcome , and duration — use this to drill into individual failures and find root causes.