Before using this dashboard, instrument your Hermes agent with OpenTelemetry and configure export to SigNoz. See the Hermes monitoring guide for complete setup instructions.
This dashboard offers a clear view into Hermes coding agent behavior and performance. It highlights key metrics such as agent turn volume, LLM API call patterns, token consumption, tool-call activity, and error trends. Teams can track end-to-end turn latency, per-model token costs, and individual failing spans to keep their agents fast and reliable.
Dashboard Preview
Dashboards → + New dashboard → Import JSON
What This Dashboard Monitors
This dashboard tracks critical performance metrics for your Hermes coding agent using OpenTelemetry traces (service: hermes-agent
) to help you: Monitor Agent Activity: Track agent turn and LLM turn counts, total tool calls, and overall API call volume to understand how actively the agent is working across sessions.Analyze Token Consumption: Observe input, output, and cache-read token usage over time and per model to understand costs, spot consumption spikes, and optimize prompting strategies.Track Model Usage: See which LLM models are being called, how tokens are distributed across them, and how finish reasons break down to measure model health and behavior.Ensure Responsiveness: Monitor end-to-end agent turn latency and LLM API call latency at p50, p95, and p99 to surface slowdowns and maintain a consistent coding experience.Understand Tool Behavior: Measure which tools are called most often, how long each tool takes, and whether tool calls succeed or error — including a summary table with call counts and p95 latency per tool.Investigate Errors: Track error spans over time by operation, view a ranked table of the most-failing operations, and drill into individual failing spans with status messages for root-cause analysis.
Metrics Included
Overview Scorecards
Agent Turns: Count of rootagent
spans in the selected time range, representing the total number of agent turns or sessions processed.LLM Turns: Count ofllm.*
wrapper spans, showing how many LLM interaction cycles the agent performed.LLM API Calls: Count of spans wherellm.model_name
exists, representing individual chat completion calls made to the model provider.Tool Calls: Count oftool.*
spans, showing the total number of tool invocations across all agent turns.Total Tokens: Sum ofgen_ai.usage.total_tokens
across all spans, giving the aggregate token consumption for the selected range.Error Spans: Count of spans wherehasError = true
, with a red threshold triggered by any non-zero value for immediate attention.
LLM & Model Metrics
LLM API Calls by Model: Pie chart breaking down chat completion call counts byllm.model_name
, helping you understand which models are called most frequently and track adoption across model versions.Token Usage Over Time: Time series showing input tokens, output tokens, and cache-read tokens stacked over time, revealing consumption trends and the benefit of prompt caching.Total Tokens by Model: Pie chart showing total token consumption split by model, useful for understanding which model drives the most cost.** LLM API Call Latency (p50 / p95 / p99)**: Duration percentiles for chat completion spans over time, surfacing model response time trends and latency regressions.Cost Proxy: Input vs Output Tokens by Model: Line chart plotting input and output token volume per model over time as a cost proxy, since no native cost attribute is available — scale by your per-model pricing to estimate spend.Responses by Finish Reason: Pie chart ofllm.response.finish_reason
values (e.g.stop
,tool_calls
,length
) to reveal how often the model terminates normally versus hitting limits or requesting tool use.
Agent & Turn Metrics
Agent Turns Over Time: Time series of rootagent
span counts, showing turn volume trends and helping identify peak activity windows or unexpected drops.Agent Turn Duration (p50 / p95): End-to-end duration percentiles foragent
spans, measuring how long complete agent turns take from start to finish.Avg API Calls per Turn: Average ofhermes.turn.api_call_count
per agent span over time, showing how many model round-trips a typical turn requires.Avg Tools per Turn: Average ofhermes.turn.tool_count
per agent span, indicating how tool-heavy the agent's reasoning is on a typical turn.Turn Final Status: Pie chart ofhermes.turn.final_status
values, showing the distribution of how agent turns complete (e.g. success, error, timeout).Sessions by Kind: Pie chart ofhermes.session.kind
values, breaking down sessions by their interaction type or mode.
Tool Call Metrics
Tool Calls by Type: Pie chart oftool.*
span counts grouped by operation name, showing which tool types the agent invokes most.Tool Call Latency (p95) by Type: Line chart of p95 duration per tool over time, identifying which tools are the slowest and most likely to bottleneck agent turns.Tool Outcomes (completed vs error): Pie chart ofhermes.tool.outcome
values, showing the ratio of successful versus failed tool executions.GenAI Tool Invocations by Name: Pie chart of tool call counts grouped bytool.name
(model-requested tools), revealing which tools the model chooses most during its reasoning loop.Tool Usage Summary: Table showing each tool type with its total call count (sorted descending) and p95 latency, giving a quick reference for the most-used and slowest tools.
Error Monitoring
Errors Over Time: Time series ofhasError = true
spans grouped by span name, letting you see which operations are failing and when failure spikes occur.Error Count by Operation: Table of error counts per operation name sorted descending, identifying the most-failing span types at a glance.** Recent Error Spans**: List of the 25 most recent errored spans sorted by timestamp, showing the span name, status message,hermes.tool.outcome
, and duration — use this to drill into individual failures and find root causes.