{"slug": "how-to-monitor-ai-agents-in-production", "title": "How to Monitor AI Agents in Production", "summary": "OpenObserve has introduced a monitoring solution for AI agents in production that relies on distributed tracing rather than logs, as a single user request can trigger ten or more internal operations across LLM calls, tool invocations, and agent steps. The system uses OpenTelemetry's GenAI semantic conventions to standardize span attributes for these operations, with auto-instrumentation libraries like OpenLLMetry, OpenInference, and OpenLIT requiring only two to three lines of initialization code. Traces are shipped to OpenObserve over OTLP, enabling SQL-queryable trace data, token usage dashboards, cost attribution by agent and model, and alerting on latency and cost anomalies.", "body_md": "TLDR\n\n- Monitoring AI agents in production requires distributed tracing: a single user request fans out into 10 or more internal operations, and logs alone cannot show you which step is slow, failing, or burning your token budget.\n- OpenTelemetry's\n`gen_ai.*`\n\nsemantic conventions give you standardized span attributes for LLM calls, tool invocations, and agent steps. Some are stable today; others are still experimental.- Auto-instrumentation libraries (OpenLLMetry, OpenInference, OpenLIT) cover most agent frameworks with two to three lines of initialization code. You do not change your agent code.\n- Traces ship to OpenObserve over OTLP. From there you get SQL-queryable trace data, token usage dashboards, cost attribution by agent and model, and alerting on latency and cost anomalies.\n- OpenObserve also exposes an MCP server. You can query your live agent traces from a Claude or GPT session without opening a dashboard.\n\nA single LLM call is straightforward to observe. One HTTP request, one response, one latency number. You can log the input and output and call it done.\n\nAn agent is different. When a user sends a message, the agent calls an LLM to decide what to do, invokes a tool, processes the result, calls the LLM again, possibly calls another tool, and eventually returns a response. That one user message becomes ten or more internal operations. Some of those operations call external APIs. Some retry. Some spawn sub-agents.\n\nWithout distributed tracing, you see none of this structure. You know the response took 8 seconds. You do not know whether the LLM took 7 of those seconds or whether a tool made three retries before timing out.\n\nFour categories of problems appear in production agents that you cannot debug without traces:\n\nDistributed tracing gives you a complete record of every operation, in order, with timing and attributes. That record is what makes these questions answerable.\n\nOpenTelemetry's GenAI semantic conventions define a standard set of span attributes for AI workloads. The stable attributes you can build on today:\n\n| Attribute | What it captures |\n|---|---|\n`gen_ai.system` |\nLLM provider: openai, anthropic, cohere |\n`gen_ai.operation.name` |\nOperation type: chat, embeddings, text_completion |\n`gen_ai.request.model` |\nModel name: gpt-4o, claude-3-5-sonnet-20241022 |\n`gen_ai.usage.input_tokens` |\nTokens consumed by the prompt |\n`gen_ai.usage.output_tokens` |\nTokens in the model response |\n`gen_ai.response.finish_reasons` |\nWhy the model stopped: stop, tool_calls, length |\n\nFor agent-specific spans, the conventions extend to `gen_ai.agent.name`\n\n, `gen_ai.agent.description`\n\n, `gen_ai.tool.name`\n\n, and `gen_ai.tool.description`\n\n. These are still marked experimental as of early 2026 but are already implemented by the major instrumentation libraries and are stable enough to use in production.\n\nFor a full breakdown of what OpenTelemetry captures for LLM workloads, including how SRE teams use the three signal types together, see [OpenTelemetry for LLMs: Complete SRE Guide](https://openobserve.ai/blog/opentelemetry-for-llms/).\n\nEvery significant operation in an agent's lifecycle becomes a span:\n\n`gen_ai.chat`\n\n: wraps a single LLM API call. Carries model name, token counts, and finish reason.`gen_ai.tool`\n\n: wraps a single tool invocation. Child of the LLM call span that requested it.`agent.step`\n\n: wraps one full reasoning cycle. Parent of all LLM and tool spans within that cycle.Prompt and completion content is large. Storing it as span attributes inflates trace payloads and storage costs. The OTel GenAI convention puts prompt and completion content into span events (typed `gen_ai.content.prompt`\n\nand `gen_ai.content.completion`\n\n) rather than attributes. Events attach to the span but are stored separately, keeping the attribute payload small while preserving full content for debugging.\n\nIn practice: leave content capture enabled during development. Before shipping to production, disable it at the application level or route it through the Collector for redaction.\n\nWhen an orchestrator delegates to a worker agent, the worker's spans need to appear under the same root trace. For HTTP-based delegation, include the W3C `traceparent`\n\nheader in the outgoing request and extract it in the worker. For in-process delegation (LangGraph node transitions, OpenAI Agents SDK handoffs), auto-instrumentation handles this automatically.\n\nThree libraries sit between your agent code and the OTel SDK. The examples in this blog use LangChain and the OpenAI Agents SDK, both supported by all three libraries. For support across other frameworks (CrewAI, AutoGen, DSPy, and more), check each library's docs.\n\n| Library | Signals | LangChain | OpenAI Agents | Config overhead |\n|---|---|---|---|---|\nOpenLLMetry (`traceloop-sdk` ) |\nTraces + Metrics + Logs | Yes | Yes | Medium |\n| OpenInference | Traces only | Yes | Yes | Low |\n| OpenLIT | Traces + Metrics | Yes | Yes | Minimal |\n\nOpenLLMetry captures the most signals and covers the widest framework catalog. OpenLIT is the easiest entry point: one import, one function call. OpenInference is traces-only but has the closest alignment with OTel GenAI semantic conventions.\n\nFor teams starting out: use OpenLLMetry. For teams already running an OTel SDK setup: use the official `opentelemetry-instrumentation-*`\n\npackages from `opentelemetry-python-contrib`\n\n, which include `opentelemetry-instrumentation-langchain`\n\nand `opentelemetry-instrumentation-openai-agents-v2`\n\n.\n\nFor a full walkthrough of OpenLIT with OpenObserve, including pre-built dashboards for GPU and vector database monitoring, see [LLM Observability for AI Applications with OpenObserve and OpenLIT](https://openobserve.ai/blog/observability-for-ai-applications-using-openobserve-and-openlit/).\n\nFor a broader comparison of open-source LLM observability tooling, see [Top Open Source LLM Observability Tools](https://openobserve.ai/blog/llm-observability-tools/).\n\nThe following examples use LangChain and the OpenAI Agents SDK. The instrumentation pattern is the same for virtually every other agent framework: install a library, initialize before importing framework classes, point the exporter at your backend.\n\nLangChain's current recommended approach for building agents uses LangGraph as the execution runtime. The `opentelemetry-instrumentation-langchain`\n\npackage instruments both.\n\n**Install:**\n\n```\npip install opentelemetry-sdk \\\n    opentelemetry-exporter-otlp-proto-http \\\n    opentelemetry-instrumentation-openai \\\n    langgraph langchain-openai\n```\n\n**Initialize before any LangChain imports:**\n\n``` python\nfrom opentelemetry.sdk.trace import TracerProvider\nfrom opentelemetry.sdk.trace.export import BatchSpanProcessor\nfrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter\nfrom opentelemetry.instrumentation.openai import OpenAIInstrumentor\n\nexporter = OTLPSpanExporter(\n    endpoint=\"<your-openobserve-otlp-endpoint>\",\n    headers={\n        \"Authorization\": \"Basic <base64(email:password)>\",\n        \"stream-name\": \"default\",\n    },\n)\n\nprovider = TracerProvider()\nprovider.add_span_processor(BatchSpanProcessor(exporter))\n\nOpenAIInstrumentor().instrument(tracer_provider=provider)\n```\n\nNote:`opentelemetry-instrumentation-langchain`\n\nhas a known compatibility issue with current LangGraph versions.`OpenAIInstrumentor`\n\ncovers the spans that matter: LLM calls with token counts, model name, and finish reason. LangChain graph-level spans can be added manually if needed.\n\n**A simple ReAct agent with a tool:**\n\n``` python\nfrom langchain.agents import create_react_agent\nfrom langchain_openai import ChatOpenAI\nfrom langchain_core.tools import tool\n\n@tool\ndef get_stock_price(ticker: str) -> str:\n    \"\"\"Get the current stock price for a ticker symbol.\"\"\"\n    # Replace with your actual data source\n    return f\"{ticker}: $142.50\"\n\nllm = ChatOpenAI(model=\"gpt-4o-mini\")\nagent = create_react_agent(llm, [get_stock_price])\n\nresult = agent.invoke({\n    \"messages\": [{\"role\": \"user\", \"content\": \"What is the price of AAPL?\"}]\n})\n```\n\nYou did not add a single line to the agent code. The instrumentation wraps LangChain's framework classes at import time and emits spans for every LLM call and tool invocation.\n\n**What you get in OpenObserve:**\n\n`gen_ai.request.model`\n\n, `gen_ai.usage.input_tokens`\n\n, and `gen_ai.usage.output_tokens`\n\nBy default, prompt and completion content is captured. Disable it for production:\n\n```\nOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=no_content\n```\n\n**Install:**\n\n```\npip install opentelemetry-sdk \\\n    opentelemetry-exporter-otlp-proto-http \\\n    opentelemetry-instrumentation-openai-agents \\\n    openai-agents\n```\n\n**Initialize:**\n\n``` python\nfrom opentelemetry.sdk.trace import TracerProvider\nfrom opentelemetry.sdk.trace.export import BatchSpanProcessor\nfrom opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter\nfrom opentelemetry.instrumentation.openai_agents import OpenAIAgentsInstrumentor\n\nexporter = OTLPSpanExporter(\n    endpoint=\"<your-openobserve-otlp-endpoint>\",\n    headers={\n        \"Authorization\": \"Basic <base64(email:password)>\",\n        \"stream-name\": \"default\",\n    },\n)\n\nprovider = TracerProvider()\nprovider.add_span_processor(BatchSpanProcessor(exporter))\nOpenAIAgentsInstrumentor().instrument(tracer_provider=provider)\n```\n\n**A two-agent handoff:**\n\n``` python\nfrom agents import Agent, handoff, Runner, function_tool\n\n@function_tool\ndef search_knowledge_base(query: str) -> str:\n    \"\"\"Search the internal knowledge base for product information.\"\"\"\n    return f\"Results for '{query}': Feature Y has been available since v2.3.\"\n\nsupport_agent = Agent(\n    name=\"support_agent\",\n    instructions=\"Answer customer questions using the knowledge base.\",\n    tools=[search_knowledge_base],\n    model=\"gpt-4o-mini\",\n)\n\ntriage_agent = Agent(\n    name=\"triage_agent\",\n    instructions=\"Route incoming requests to the correct specialist.\",\n    handoffs=[handoff(support_agent)],\n    model=\"gpt-4o-mini\",\n)\n\nresult = Runner.run_sync(triage_agent, \"How do I enable feature Y?\")\n```\n\nThe instrumentation generates spans for each agent activation (tagged with `gen_ai.agent.name`\n\n), each LLM generation (with model and token counts), each tool call (with name and arguments), and each handoff between agents. The handoff span shows up as a child of the triage agent span and a parent of the support agent span, giving you the full call tree.\n\nContent capture is controlled separately from OpenLLMetry:\n\n```\nOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=span_only\n```\n\nOptions: `span_only`\n\n, `event_only`\n\n, `span_and_event`\n\n, `no_content`\n\n. Use `no_content`\n\nin production if prompts contain PII.\n\nThe OTLP exporter configuration shown in the examples above works for both self-hosted and cloud deployments. The only difference is the endpoint URL.\n\n**Self-hosted OpenObserve (port 5080):**\n\n```\nOTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:5080/api/default/v1/traces\nOTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64_token>,stream-name=default\n```\n\n**OpenObserve Cloud:**\n\n```\nOTEL_EXPORTER_OTLP_ENDPOINT=https://api.openobserve.ai/api/<your_org>/v1/traces\nOTEL_EXPORTER_OTLP_HEADERS=Authorization=Basic <base64_token>,stream-name=default\n```\n\nGenerate the base64 token:\n\n```\necho -n \"your_email@example.com:your_password\" | base64\n```\n\nDirect export is simpler for development and small deployments. The application sends spans directly to OpenObserve with no intermediate hop.\n\nThe OTel Collector adds a processing layer between your agent and OpenObserve. It is worth adding when you need any of the following:\n\nFor a complete OTLP exporter configuration guide covering both the direct and Collector paths, see [LangChain and LlamaIndex Tracing with OpenObserve](https://openobserve.ai/blog/langchain-llamaindex-openobserve/).\n\n**Sample Collector configuration pointing at OpenObserve:**\n\n```\nreceivers:\n  otlp:\n    protocols:\n      grpc:\n        endpoint: 0.0.0.0:4317\n      http:\n        endpoint: 0.0.0.0:4318\n\nprocessors:\n  batch:\n\nexporters:\n  otlphttp/openobserve:\n    endpoint: <your-openobserve-otlp-endpoint>\n    headers:\n      Authorization: \"Basic <base64_token>\"\n      stream-name: default\n\nservice:\n  pipelines:\n    traces:\n      receivers: [otlp]\n      processors: [batch]\n      exporters: [otlphttp/openobserve]\n```\n\nYou can find your OTLP endpoint and the matching `Authorization`\n\nheader in the OpenObserve UI under **Data Sources → OpenTelemetry Collector** — copy the values directly from there into your Collector config.\n\nThe trace timeline shows every span as a horizontal bar: width is duration, indentation is the parent-child relationship. For a LangChain ReAct agent, you can immediately see which LLM call or tool invocation is driving latency, something that's invisible in logs.\n\nOpenObserve lets you query trace data with SQL directly against the `gen_ai.*`\n\nattributes. For example, token usage by model over the last hour:\n\n```\nSELECT\n    gen_ai_request_model AS model,\n    SUM(CAST(gen_ai_usage_input_tokens AS BIGINT)) AS input_tokens,\n    SUM(CAST(gen_ai_usage_output_tokens AS BIGINT)) AS output_tokens\nFROM default\nWHERE gen_ai_request_model IS NOT NULL\nGROUP BY gen_ai_request_model\nORDER BY input_tokens DESC\n```\n\nNote:OpenObserve stores span attributes as top-level flattened fields using underscores (`gen_ai_request_model`\n\n, not`attributes['gen_ai.request.model']`\n\n). The time range filter is applied via the dashboard time picker rather than in SQL, since`_timestamp`\n\nis stored as nanosecond`Int64`\n\nand is not directly comparable to`NOW()`\n\n.\n\nYou can extend the same pattern to P99 latency by agent (`span_name = 'agent.step'`\n\n) or error rate by tool (`span_name = 'gen_ai.tool'`\n\n). For a full cost attribution setup (per-agent, per-model, with real-time spend alerting), see [LLM Cost Monitoring with OpenObserve](https://openobserve.ai/blog/llm-cost-monitoring/).\n\nOpenObserve exposes an MCP server, so any MCP-compatible LLM client can query your trace store directly, with no dashboard or SQL client required. Connect it to Claude Code:\n\n```\nclaude mcp add o2 https://api.openobserve.ai/api/<your_org>/mcp \\\n  -t http \\\n  --header \"Authorization: Basic <base64_token>\"\n```\n\nFor self-hosted OpenObserve, replace the URL with `http://localhost:5080/api/<your_org>/mcp`\n\n. Once connected, ask questions like \"which tool had the highest error rate in the last hour?\" and get structured results back in your LLM session.\n\nFor a full guide to MCP servers in the observability stack, see [What OpenObserve MCP server can do?](https://openobserve.ai/blog/mcp-servers-observability-guide/)\n\nDisable prompt and completion capture at the application level before traces leave the process:\n\n```\n# OpenLLMetry\nTRACELOOP_TRACE_CONTENT=false\n\n# OpenAI Agents SDK / OTel GenAI instrumentation\nOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=no_content\n```\n\nFor finer-grained redaction (specific patterns, or third-party instrumentation you don't fully control), OpenObserve has a native sensitive data redaction feature with 140+ built-in PII patterns and redact/hash/drop actions applied at ingestion time. See [Sensitive Data Redaction in OpenObserve](https://openobserve.ai/blog/sensitive-data-redaction-openobserve/) for a full walkthrough, or the [OTel Collector approach for logs](https://openobserve.ai/blog/redact-sensitive-data-in-logs/) if you prefer to handle it at the pipeline level.\n\nLLM spans are large and frequent. Tracing at 100% is expensive. Use tail-based sampling in the Collector: keep 100% of error traces and slow traces (e.g. >5s), and sample the rest probabilistically (e.g. 10%). This preserves the traces you need for debugging while keeping storage costs predictable. For a deeper look at head- vs. tail-based sampling tradeoffs and Collector configuration, see [Head-Based vs Tail-Based Sampling](https://openobserve.ai/blog/head-and-tail-based-sampling/).\n\nFour alerts to configure before your agent goes to production:\n\n`agent.step`\n\nspans exceeds 10 seconds in a 5-minute window`gen_ai.usage.output_tokens`\n\nper hour exceeds your 7-day baseline by 3x`gen_ai.tool`\n\nspan exceeds 5% in 15 minutesOpenObserve supports scheduled and real-time alerts with SQL, PromQL, or the query builder. See the [Alerts docs](https://openobserve.ai/docs/user-guide/analytics/alerts/) to configure these.\n\nOpenObserve Cloud gives you an OTLP endpoint ready to accept traces, metrics, and logs with no infrastructure to provision. Point your exporter at `https://api.openobserve.ai/api/<your_org>/v1/traces`\n\n, set your auth header, and agent traces start appearing in the UI within seconds. The same SQL queries, cost dashboards, and MCP server are available from day one.", "url": "https://wpnews.pro/news/how-to-monitor-ai-agents-in-production", "canonical_source": "https://dev.to/manas_sharma/how-to-monitor-ai-agents-in-production-1mn2", "published_at": "2026-05-28 06:18:40+00:00", "updated_at": "2026-05-28 06:22:55.766994+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "mlops", "ai-tools"], "entities": ["OpenTelemetry", "OpenLLMetry", "OpenInference", "OpenLIT", "OpenObserve", "Claude", "GPT"], "alternates": {"html": "https://wpnews.pro/news/how-to-monitor-ai-agents-in-production", "markdown": "https://wpnews.pro/news/how-to-monitor-ai-agents-in-production.md", "text": "https://wpnews.pro/news/how-to-monitor-ai-agents-in-production.txt", "jsonld": "https://wpnews.pro/news/how-to-monitor-ai-agents-in-production.jsonld"}}