cd /news/ai-agents/hermes-agent-dashboard · home topics ai-agents article
[ARTICLE · art-34104] src=signoz.io ↗ pub= topic=ai-agents verified=true sentiment=↑ positive

Hermes Agent Dashboard

SigNoz released a new Hermes Agent Dashboard that provides real-time monitoring of Hermes coding agent performance using OpenTelemetry traces. The dashboard tracks agent turn volume, LLM API call patterns, token consumption, tool-call activity, and error trends to help teams optimize agent speed and reliability.

read4 min views1 publishedJun 11, 2026

Before using this dashboard, instrument your Hermes agent with OpenTelemetry and configure export to SigNoz. See the Hermes monitoring guide for complete setup instructions.

This dashboard offers a clear view into Hermes coding agent behavior and performance. It highlights key metrics such as agent turn volume, LLM API call patterns, token consumption, tool-call activity, and error trends. Teams can track end-to-end turn latency, per-model token costs, and individual failing spans to keep their agents fast and reliable.

Dashboard Preview

Dashboards → + New dashboard → Import JSON

What This Dashboard Monitors

This dashboard tracks critical performance metrics for your Hermes coding agent using OpenTelemetry traces (service: hermes-agent

) to help you: Monitor Agent Activity: Track agent turn and LLM turn counts, total tool calls, and overall API call volume to understand how actively the agent is working across sessions.Analyze Token Consumption: Observe input, output, and cache-read token usage over time and per model to understand costs, spot consumption spikes, and optimize prompting strategies.Track Model Usage: See which LLM models are being called, how tokens are distributed across them, and how finish reasons break down to measure model health and behavior.Ensure Responsiveness: Monitor end-to-end agent turn latency and LLM API call latency at p50, p95, and p99 to surface slowdowns and maintain a consistent coding experience.Understand Tool Behavior: Measure which tools are called most often, how long each tool takes, and whether tool calls succeed or error — including a summary table with call counts and p95 latency per tool.Investigate Errors: Track error spans over time by operation, view a ranked table of the most-failing operations, and drill into individual failing spans with status messages for root-cause analysis.

Metrics Included

Overview Scorecards

Agent Turns: Count of rootagent

spans in the selected time range, representing the total number of agent turns or sessions processed.LLM Turns: Count ofllm.*

wrapper spans, showing how many LLM interaction cycles the agent performed.LLM API Calls: Count of spans wherellm.model_name

exists, representing individual chat completion calls made to the model provider.Tool Calls: Count oftool.*

spans, showing the total number of tool invocations across all agent turns.Total Tokens: Sum ofgen_ai.usage.total_tokens

across all spans, giving the aggregate token consumption for the selected range.Error Spans: Count of spans wherehasError = true

, with a red threshold triggered by any non-zero value for immediate attention.

LLM & Model Metrics

LLM API Calls by Model: Pie chart breaking down chat completion call counts byllm.model_name

, helping you understand which models are called most frequently and track adoption across model versions.Token Usage Over Time: Time series showing input tokens, output tokens, and cache-read tokens stacked over time, revealing consumption trends and the benefit of prompt caching.Total Tokens by Model: Pie chart showing total token consumption split by model, useful for understanding which model drives the most cost.** LLM API Call Latency (p50 / p95 / p99)**: Duration percentiles for chat completion spans over time, surfacing model response time trends and latency regressions.Cost Proxy: Input vs Output Tokens by Model: Line chart plotting input and output token volume per model over time as a cost proxy, since no native cost attribute is available — scale by your per-model pricing to estimate spend.Responses by Finish Reason: Pie chart ofllm.response.finish_reason

values (e.g.stop

,tool_calls

,length

) to reveal how often the model terminates normally versus hitting limits or requesting tool use.

Agent & Turn Metrics

Agent Turns Over Time: Time series of rootagent

span counts, showing turn volume trends and helping identify peak activity windows or unexpected drops.Agent Turn Duration (p50 / p95): End-to-end duration percentiles foragent

spans, measuring how long complete agent turns take from start to finish.Avg API Calls per Turn: Average ofhermes.turn.api_call_count

per agent span over time, showing how many model round-trips a typical turn requires.Avg Tools per Turn: Average ofhermes.turn.tool_count

per agent span, indicating how tool-heavy the agent's reasoning is on a typical turn.Turn Final Status: Pie chart ofhermes.turn.final_status

values, showing the distribution of how agent turns complete (e.g. success, error, timeout).Sessions by Kind: Pie chart ofhermes.session.kind

values, breaking down sessions by their interaction type or mode.

Tool Call Metrics

Tool Calls by Type: Pie chart oftool.*

span counts grouped by operation name, showing which tool types the agent invokes most.Tool Call Latency (p95) by Type: Line chart of p95 duration per tool over time, identifying which tools are the slowest and most likely to bottleneck agent turns.Tool Outcomes (completed vs error): Pie chart ofhermes.tool.outcome

values, showing the ratio of successful versus failed tool executions.GenAI Tool Invocations by Name: Pie chart of tool call counts grouped bytool.name

(model-requested tools), revealing which tools the model chooses most during its reasoning loop.Tool Usage Summary: Table showing each tool type with its total call count (sorted descending) and p95 latency, giving a quick reference for the most-used and slowest tools.

Error Monitoring

Errors Over Time: Time series ofhasError = true

spans grouped by span name, letting you see which operations are failing and when failure spikes occur.Error Count by Operation: Table of error counts per operation name sorted descending, identifying the most-failing span types at a glance.** Recent Error Spans**: List of the 25 most recent errored spans sorted by timestamp, showing the span name, status message,hermes.tool.outcome

, and duration — use this to drill into individual failures and find root causes.

── more in #ai-agents 4 stories · sorted by recency
── more on @signoz 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/hermes-agent-dashboa…] indexed:0 read:4min 2026-06-11 ·