How to Monitor AI Agents in Production

OpenObserve has introduced a monitoring solution for AI agents in production that relies on distributed tracing rather than logs, as a single user request can trigger ten or more internal operations across LLM calls, tool invocations, and agent steps. The system uses OpenTelemetry's GenAI semantic conventions to standardize span attributes for these operations, with auto-instrumentation libraries like OpenLLMetry, OpenInference, and OpenLIT requiring only two to three lines of initialization code. Traces are shipped to OpenObserve over OTLP, enabling SQL-queryable trace data, token usage dashboards, cost attribution by agent and model, and alerting on latency and cost anomalies.

TLDR - Monitoring AI agents in production requires distributed tracing: a single user request fans out into 10 or more internal operations, and logs alone cannot show you which step is slow, failing, or burning your token budget. - OpenTelemetry's gen ai. semantic conventions give you standardized span attributes for LLM calls, tool invocations, and agent steps. Some are stable today; others are still experimental.- Auto-instrumentation libraries OpenLLMetry, OpenInference, OpenLIT cover most agent frameworks with two to three lines of initialization code. You do not change your agent code. - Traces ship to OpenObserve over OTLP. From there you get SQL-queryable trace data, token usage dashboards, cost attribution by agent and model, and alerting on latency and cost anomalies. - OpenObserve also exposes an MCP server. You can query your live agent traces from a Claude or GPT session without opening a dashboard. A single LLM call is straightforward to observe. One HTTP request, one response, one latency number. You can log the input and output and call it done. An agent is different. When a user sends a message, the agent calls an LLM to decide what to do, invokes a tool, processes the result, calls the LLM again, possibly calls another tool, and eventually returns a response. That one user message becomes ten or more internal operations. Some of those operations call external APIs. Some retry. Some spawn sub-agents. Without distributed tracing, you see none of this structure. You know the response took 8 seconds. You do not know whether the LLM took 7 of those seconds or whether a tool made three retries before timing out. Four categories of problems appear in production agents that you cannot debug without traces: Distributed tracing gives you a complete record of every operation, in order, with timing and attributes. That record is what makes these questions answerable. OpenTelemetry's GenAI semantic conventions define a standard set of span attributes for AI workloads. The stable attributes you can build on today: | Attribute | What it captures | |---|---| gen ai.system | LLM provider: openai, anthropic, cohere | gen ai.operation.name | Operation type: chat, embeddings, text completion | gen ai.request.model | Model name: gpt-4o, claude-3-5-sonnet-20241022 | gen ai.usage.input tokens | Tokens consumed by the prompt | gen ai.usage.output tokens | Tokens in the model response | gen ai.response.finish reasons | Why the model stopped: stop, tool calls, length | For agent-specific spans, the conventions extend to gen ai.agent.name , gen ai.agent.description , gen ai.tool.name , and gen ai.tool.description . These are still marked experimental as of early 2026 but are already implemented by the major instrumentation libraries and are stable enough to use in production. For a full breakdown of what OpenTelemetry captures for LLM workloads, including how SRE teams use the three signal types together, see OpenTelemetry for LLMs: Complete SRE Guide https://openobserve.ai/blog/opentelemetry-for-llms/ . Every significant operation in an agent's lifecycle becomes a span: gen ai.chat : wraps a single LLM API call. Carries model name, token counts, and finish reason. gen ai.tool : wraps a single tool invocation. Child of the LLM call span that requested it. agent.step : wraps one full reasoning cycle. Parent of all LLM and tool spans within that cycle.Prompt and completion content is large. Storing it as span attributes inflates trace payloads and storage costs. The OTel GenAI convention puts prompt and completion content into span events typed gen ai.content.prompt and gen ai.content.completion rather than attributes. Events attach to the span but are stored separately, keeping the attribute payload small while preserving full content for debugging. In practice: leave content capture enabled during development. Before shipping to production, disable it at the application level or route it through the Collector for redaction. When an orchestrator delegates to a worker agent, the worker's spans need to appear under the same root trace. For HTTP-based delegation, include the W3C traceparent header in the outgoing request and extract it in the worker. For in-process delegation LangGraph node transitions, OpenAI Agents SDK handoffs , auto-instrumentation handles this automatically. Three libraries sit between your agent code and the OTel SDK. The examples in this blog use LangChain and the OpenAI Agents SDK, both supported by all three libraries. For support across other frameworks CrewAI, AutoGen, DSPy, and more , check each library's docs. | Library | Signals | LangChain | OpenAI Agents | Config overhead | |---|---|---|---|---| OpenLLMetry traceloop-sdk | Traces + Metrics + Logs | Yes | Yes | Medium | | OpenInference | Traces only | Yes | Yes | Low | | OpenLIT | Traces + Metrics | Yes | Yes | Minimal | OpenLLMetry captures the most signals and covers the widest framework catalog. OpenLIT is the easiest entry point: one import, one function call. OpenInference is traces-only but has the closest alignment with OTel GenAI semantic conventions. For teams starting out: use OpenLLMetry. For teams already running an OTel SDK setup: use the official opentelemetry-instrumentation- packages from opentelemetry-python-contrib , which include opentelemetry-instrumentation-langchain and opentelemetry-instrumentation-openai-agents-v2 . For a full walkthrough of OpenLIT with OpenObserve, including pre-built dashboards for GPU and vector database monitoring, see LLM Observability for AI Applications with OpenObserve and OpenLIT https://openobserve.ai/blog/observability-for-ai-applications-using-openobserve-and-openlit/ . For a broader comparison of open-source LLM observability tooling, see Top Open Source LLM Observability Tools https://openobserve.ai/blog/llm-observability-tools/ . The following examples use LangChain and the OpenAI Agents SDK. The instrumentation pattern is the same for virtually every other agent framework: install a library, initialize before importing framework classes, point the exporter at your backend. LangChain's current recommended approach for building agents uses LangGraph as the execution runtime. The opentelemetry-instrumentation-langchain package instruments both. Install: pip install opentelemetry-sdk \ opentelemetry-exporter-otlp-proto-http \ opentelemetry-instrumentation-openai \ langgraph langchain-openai Initialize before any LangChain imports: python from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.http.trace exporter import OTLPSpanExporter from opentelemetry.instrumentation.openai import OpenAIInstrumentor exporter = OTLPSpanExporter endpoint="<your-openobserve-otlp-endpoint ", headers={ "Authorization": "Basic <base64 email:password ", "stream-name": "default", }, provider = TracerProvider provider.add span processor BatchSpanProcessor exporter OpenAIInstrumentor .instrument tracer provider=provider Note: opentelemetry-instrumentation-langchain has a known compatibility issue with current LangGraph versions. OpenAIInstrumentor covers the spans that matter: LLM calls with token counts, model name, and finish reason. LangChain graph-level spans can be added manually if needed. A simple ReAct agent with a tool: python from langchain.agents import create react agent from langchain openai import ChatOpenAI from langchain core.tools import tool @tool def get stock price ticker: str - str: """Get the current stock price for a ticker symbol.""" Replace with your actual data source return f"{ticker}: $142.50" llm = ChatOpenAI model="gpt-4o-mini" agent = create react agent llm, get stock price result = agent.invoke { "messages": {"role": "user", "content": "What is the price of AAPL?"} } You did not add a single line to the agent code. The instrumentation wraps LangChain's framework classes at import time and emits spans for every LLM call and tool invocation. What you get in OpenObserve: gen ai.request.model , gen ai.usage.input tokens , and gen ai.usage.output tokens By default, prompt and completion content is captured. Disable it for production: OTEL INSTRUMENTATION GENAI CAPTURE MESSAGE CONTENT=no content Install: pip install opentelemetry-sdk \ opentelemetry-exporter-otlp-proto-http \ opentelemetry-instrumentation-openai-agents \ openai-agents Initialize: python from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import BatchSpanProcessor from opentelemetry.exporter.otlp.proto.http.trace exporter import OTLPSpanExporter from opentelemetry.instrumentation.openai agents import OpenAIAgentsInstrumentor exporter = OTLPSpanExporter endpoint="<your-openobserve-otlp-endpoint ", headers={ "Authorization": "Basic <base64 email:password ", "stream-name": "default", }, provider = TracerProvider provider.add span processor BatchSpanProcessor exporter OpenAIAgentsInstrumentor .instrument tracer provider=provider A two-agent handoff: python from agents import Agent, handoff, Runner, function tool @function tool def search knowledge base query: str - str: """Search the internal knowledge base for product information.""" return f"Results for '{query}': Feature Y has been available since v2.3." support agent = Agent name="support agent", instructions="Answer customer questions using the knowledge base.", tools= search knowledge base , model="gpt-4o-mini", triage agent = Agent name="triage agent", instructions="Route incoming requests to the correct specialist.", handoffs= handoff support agent , model="gpt-4o-mini", result = Runner.run sync triage agent, "How do I enable feature Y?" The instrumentation generates spans for each agent activation tagged with gen ai.agent.name , each LLM generation with model and token counts , each tool call with name and arguments , and each handoff between agents. The handoff span shows up as a child of the triage agent span and a parent of the support agent span, giving you the full call tree. Content capture is controlled separately from OpenLLMetry: OTEL INSTRUMENTATION GENAI CAPTURE MESSAGE CONTENT=span only Options: span only , event only , span and event , no content . Use no content in production if prompts contain PII. The OTLP exporter configuration shown in the examples above works for both self-hosted and cloud deployments. The only difference is the endpoint URL. Self-hosted OpenObserve port 5080 : OTEL EXPORTER OTLP ENDPOINT=http://localhost:5080/api/default/v1/traces OTEL EXPORTER OTLP HEADERS=Authorization=Basic <base64 token ,stream-name=default OpenObserve Cloud: OTEL EXPORTER OTLP ENDPOINT=https://api.openobserve.ai/api/<your org /v1/traces OTEL EXPORTER OTLP HEADERS=Authorization=Basic <base64 token ,stream-name=default Generate the base64 token: echo -n "your email@example.com:your password" | base64 Direct export is simpler for development and small deployments. The application sends spans directly to OpenObserve with no intermediate hop. The OTel Collector adds a processing layer between your agent and OpenObserve. It is worth adding when you need any of the following: For a complete OTLP exporter configuration guide covering both the direct and Collector paths, see LangChain and LlamaIndex Tracing with OpenObserve https://openobserve.ai/blog/langchain-llamaindex-openobserve/ . Sample Collector configuration pointing at OpenObserve: receivers: otlp: protocols: grpc: endpoint: 0.0.0.0:4317 http: endpoint: 0.0.0.0:4318 processors: batch: exporters: otlphttp/openobserve: endpoint: <your-openobserve-otlp-endpoint headers: Authorization: "Basic <base64 token " stream-name: default service: pipelines: traces: receivers: otlp processors: batch exporters: otlphttp/openobserve You can find your OTLP endpoint and the matching Authorization header in the OpenObserve UI under Data Sources → OpenTelemetry Collector — copy the values directly from there into your Collector config. The trace timeline shows every span as a horizontal bar: width is duration, indentation is the parent-child relationship. For a LangChain ReAct agent, you can immediately see which LLM call or tool invocation is driving latency, something that's invisible in logs. OpenObserve lets you query trace data with SQL directly against the gen ai. attributes. For example, token usage by model over the last hour: SELECT gen ai request model AS model, SUM CAST gen ai usage input tokens AS BIGINT AS input tokens, SUM CAST gen ai usage output tokens AS BIGINT AS output tokens FROM default WHERE gen ai request model IS NOT NULL GROUP BY gen ai request model ORDER BY input tokens DESC Note:OpenObserve stores span attributes as top-level flattened fields using underscores gen ai request model , not attributes 'gen ai.request.model' . The time range filter is applied via the dashboard time picker rather than in SQL, since timestamp is stored as nanosecond Int64 and is not directly comparable to NOW . You can extend the same pattern to P99 latency by agent span name = 'agent.step' or error rate by tool span name = 'gen ai.tool' . For a full cost attribution setup per-agent, per-model, with real-time spend alerting , see LLM Cost Monitoring with OpenObserve https://openobserve.ai/blog/llm-cost-monitoring/ . OpenObserve exposes an MCP server, so any MCP-compatible LLM client can query your trace store directly, with no dashboard or SQL client required. Connect it to Claude Code: claude mcp add o2 https://api.openobserve.ai/api/<your org /mcp \ -t http \ --header "Authorization: Basic <base64 token " For self-hosted OpenObserve, replace the URL with http://localhost:5080/api/<your org /mcp . Once connected, ask questions like "which tool had the highest error rate in the last hour?" and get structured results back in your LLM session. For a full guide to MCP servers in the observability stack, see What OpenObserve MCP server can do? https://openobserve.ai/blog/mcp-servers-observability-guide/ Disable prompt and completion capture at the application level before traces leave the process: OpenLLMetry TRACELOOP TRACE CONTENT=false OpenAI Agents SDK / OTel GenAI instrumentation OTEL INSTRUMENTATION GENAI CAPTURE MESSAGE CONTENT=no content For finer-grained redaction specific patterns, or third-party instrumentation you don't fully control , OpenObserve has a native sensitive data redaction feature with 140+ built-in PII patterns and redact/hash/drop actions applied at ingestion time. See Sensitive Data Redaction in OpenObserve https://openobserve.ai/blog/sensitive-data-redaction-openobserve/ for a full walkthrough, or the OTel Collector approach for logs https://openobserve.ai/blog/redact-sensitive-data-in-logs/ if you prefer to handle it at the pipeline level. LLM spans are large and frequent. Tracing at 100% is expensive. Use tail-based sampling in the Collector: keep 100% of error traces and slow traces e.g. 5s , and sample the rest probabilistically e.g. 10% . This preserves the traces you need for debugging while keeping storage costs predictable. For a deeper look at head- vs. tail-based sampling tradeoffs and Collector configuration, see Head-Based vs Tail-Based Sampling https://openobserve.ai/blog/head-and-tail-based-sampling/ . Four alerts to configure before your agent goes to production: agent.step spans exceeds 10 seconds in a 5-minute window gen ai.usage.output tokens per hour exceeds your 7-day baseline by 3x gen ai.tool span exceeds 5% in 15 minutesOpenObserve supports scheduled and real-time alerts with SQL, PromQL, or the query builder. See the Alerts docs https://openobserve.ai/docs/user-guide/analytics/alerts/ to configure these. OpenObserve Cloud gives you an OTLP endpoint ready to accept traces, metrics, and logs with no infrastructure to provision. Point your exporter at https://api.openobserve.ai/api/<your org /v1/traces , set your auth header, and agent traces start appearing in the UI within seconds. The same SQL queries, cost dashboards, and MCP server are available from day one.