{"slug": "observing-llm-applications-with-opentelemetry", "title": "Observing LLM Applications with OpenTelemetry", "summary": "OpenTelemetry, an open-source observability framework, is being adopted to monitor non-deterministic outputs and performance issues in LLM-based applications. The technology addresses challenges like hallucinations, inconsistent responses, and provider-side latency spikes that arise when integrating large language models into production systems. Developers can use OpenTelemetry's standardized instrumentation to collect telemetry data without vendor lock-in, enabling backend-agnostic monitoring of AI features.", "body_md": "# Observing LLM Applications with OpenTelemetry\n\nEver since OpenAI launched ChatGPT in November 2022, AI usage has exploded worldwide. Integrating LLMs into applications began soon after, rapidly going from an experimental, nice-to-have feature to a competitive, baseline requirement.\n\nAnd while you can find an AI implementation in almost every product today, shipping production-ready LLM features introduces its own set of challenges that developers must contend with.\n\nIn this article, we’ll dive into why observing LLM-based applications is now a critical requirement, what OpenTelemetry is, and how to integrate it into your applications with a practical demo.\n\nDuring this process, we will also look at the current maturity level of LLM-specific OpenTelemetry libraries, the GenAI Semantic Conventions, and some practical challenges you can face while instrumenting your LLM applications.\n\nWhy do LLM applications need observability?\n\nIf you are already familiar with the challenges of maintaining LLM applications across their lifecycle, feel free to [skip to the next section](#what-is-opentelemetry) that discusses OpenTelemetry.\n\nHandling non-determinism\n\nNow you might think that observing your LLM integrations is not that different from classic observability. The key difference is that the output generated by LLMs is **non-deterministic**: the same input can produce completely different outputs across runs.\n\nDevelopers often equip LLMs with dedicated tools since models can hallucinate unpredictably on tasks that require precise, deterministic output.\n\nEnsuring context-appropriate responses\n\nNon-determinism **does not** mean that the responses are actually incorrect. In most scenarios though, developers likely want their responses to be structured in a certain way.\n\nFor example, while the response \"very likely\" for a query like \"chances of rain tomorrow\" might be suitable, the same response for a query like \"chances of stock market climbing tomorrow\" might be unacceptable, where the user likely expects more nuance from the application system.\n\nEnsuring that responses remain consistent across a range of user queries is one of the key factors that separates a polished LLM product from an unreliable one.\n\nManaging quality across updates\n\nLLM providers frequently release model updates, modify their backends, and provide optimal usage guides. Meanwhile, developers also experiment with model configurations and share the ones which work for them. All in all, the space is developing quickly, and each of these factors can affect the response quality of your LLM setup.\n\nAs a practical example, LLM providers can suffer \"brown-outs\" where their infrastructure cannot keep up with user demand, leading to latency spikes, timeouts, or even degraded response quality in certain scenarios, making it critical to observe how your LLM setup holds up over time.\n\nWhat is OpenTelemetry?\n\n[OpenTelemetry](https://signoz.io/opentelemetry/) (OTel) is a Cloud Native Computing Foundation (CNCF) project aimed at standardizing the way we instrument applications for generating telemetry data. Before OpenTelemetry arrived, telemetry data lived in silos and often had little or no correlation between signals.\n\nIt follows a [specification-driven development](https://github.com/open-telemetry/opentelemetry-specification?tab=readme-ov-file) model that standardizes telemetry generation and collection details, meaning any compatible backend can process and visualize telemetry data emitted via its SDKs.\n\nAs there is no need to rewrite the entire instrumentation plumbing each time you change observability backends, there is **no vendor lock-in**.\n\nImplementing OpenTelemetry in LLM Applications\n\nPrerequisites\n\n- Python 3.12 or newer. Download the\n[latest version](https://www.python.org/downloads/). - A\n[SigNoz Cloud account](https://signoz.io/teams/)for visualizing the telemetry data. - An\n[OpenAI API key](https://platform.openai.com/api-keys)to use with the application. - An API client like\n[Postman](https://www.postman.com/)or[Bruno](https://www.usebruno.com/)for managing API payloads and visualizing responses.\n\nWhile earlier Python versions like 3.10 may technically work, they are nearing their [end of life](https://devguide.python.org/versions/#supported-versions). Python 3.12 will continue to receive security updates till late 2028.\n\nSetting up SigNoz\n\nSigNoz is an OpenTelemetry-native observability platform that provides logs, traces, and metrics in a unified platform.\n\n[Sign up](https://signoz.io/teams/)for a free SigNoz Cloud account.[Follow the documentation](https://signoz.io/docs/ingestion/signoz-cloud/keys/)to create ingestion keys for your account.- Ensure the region and ingestion key values are readily accessible for the following steps.\n\nOnce done, you’re ready to configure the application and point it towards your SigNoz instance.\n\nRunning the Demo Application\n\nApplication Setup\n\nClone the SigNoz Examples repository and navigate to the application folder:\n\n```\ngit clone https://github.com/SigNoz/examples.git\ncd examples/python/opentelemetry-llm-demo\n```\n\nCreate and activate a Python virtual environment.\n\n```\npython3.12 -m venv .venv\nsource .venv/bin/activate\n```\n\nThe `requirements.txt`\n\nfile contains all the necessary [OpenTelemetry Python](https://signoz.io/docs/instrumentation/opentelemetry-python/) packages. Install them by running:\n\n```\npython -m pip install -r requirements.txt\n```\n\nThe following dependencies enable the OpenTelemetry instrumentation process:\n\n`opentelemetry-distro`\n\n: This provides a convenient mechanism to automatically configure some of the more common options for users, helping us get started with OpenTelemetry auto-instrumentation quickly.`opentelemetry-exporter-otlp`\n\n: This package installs[the OTLP](https://signoz.io/blog/what-is-otlp/)exporters required to transmit telemetry data to any[OpenTelemetry backend](https://signoz.io/blog/opentelemetry-backend/).\n\nThe following command detects standard libraries or frameworks (such as FastAPI) used in our application, and installs their respective instrumentation libraries:\n\n```\nopentelemetry-bootstrap --action=install\n```\n\nFinally, we will configure our environment variables and start the application, wrapping the entrypoint within `opentelemetry-instrument`\n\nto auto-instrument our application code.\n\n```\nOPENAI_API_KEY=\"<your-openai-api-key>\" \\\nOTEL_EXPORTER_OTLP_ENDPOINT=\"https://ingest.<your-region>.signoz.cloud:443\" \\\nOTEL_EXPORTER_OTLP_HEADERS=\"signoz-ingestion-key=<your-ingestion-key>\" \\\nOTEL_SERVICE_NAME=\"opentelemetry-llm-demo\" \\\nOTEL_RESOURCE_ATTRIBUTES=\"service.version=0.1.0,deployment.environment=dev\" \\\nOTEL_EXPORTER_OTLP_PROTOCOL=\"http/protobuf\" \\\nOTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true \\\nopentelemetry-instrument fastapi run --port 8085 --workers 1\n```\n\nReplace the `<your-region>`\n\nand the `your-ingestion-key`\n\nplaceholders with the region of your SigNoz workspace (e.g., `us`\n\n, `in`\n\n) and your newly created ingestion key. You will also need to supply your OpenAI API key.\n\n`OTEL_RESOURCE_ATTRIBUTES`\n\ndefines the metadata attached to each batch of telemetry that goes out of our application, and the service name `opentelemetry-llm-demo`\n\nensures the OTel backend can correctly identify the telemetry source.\n\nSetting `OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED`\n\nto `true`\n\nensures that application logs are exported alongside and correlated with the generated traces.\n\nRun the command, and the FastAPI server should start on port 8085. Calling the `/`\n\nendpoint will return a simple success response.\n\n```\n(.venv)  ❯ curl \"http://127.0.0.1:8085/\"\n{\"message\":\"OpenTelemetry NBA agent demo is running\"}\n```\n\nBefore we move ahead, let’s look at how the OpenTelemetry community is standardizing telemetry standards, capture, and export processes for AI-driven applications.\n\nEvolving GenAI Standards\n\nThroughout this article, we’ve used the term \"LLMs\", but as become familiar with OpenTelemetry, you will see the term `Generative AI`\n\n(or `GenAI`\n\n) being used almost exclusively. OpenTelemetry uses the umbrella term `GenAI`\n\nto refer to any application utilizing AI models.\n\nLater when we use our application, you’ll see that all telemetry attributes and metrics ingested by SigNoz have the `gen_ai.*`\n\nprefix, such as [ gen_ai.agent.name](https://opentelemetry.io/docs/specs/semconv/registry/attributes/gen-ai/#gen-ai-agent-name).\n\nIt is crucial to understand that these standards are evolving rapidly and are still technically in [Development status](https://opentelemetry.io/docs/specs/semconv/gen-ai/). Because of this fast-paced development, and the criticality of the GenAI project, the OpenTelemetry team recently created a [separate, dedicated repository](https://github.com/open-telemetry/opentelemetry-python-genai) to maintain GenAI instrumentation libraries.\n\nThis \"immaturity\" has a real-world impact on developers. For example, the `opentelemetry-bootstrap`\n\ntool currently doesn’t recognize or install the OpenAI Agents SDK that we’ll be using in our application.\n\nTo counter this, we have manually added the corresponding entry to the [ requirements.txt file](https://github.com/SigNoz/examples/blob/main/python/opentelemetry-llm-demo/requirements.txt#L6).\n\nFurther, you will often find that the current GenAI instrumentation libraries struggle to maintain full coverage for these rapidly changing AI implementations. We will explore this in more detail when we analyze our app’s trace output.\n\nDissecting the OpenTelemetry LLM Application\n\nNow let's go over the key implementation details to fully understand how the agentic workflow has been wired up. [Skip to the next section](#visualizing-agentic-workflows-with-opentelemetry) if you want to jump directly to the interactive part.\n\nDependency Choices\n\nBesides the essential OpenTelemetry dependencies we’ve already discussed above, the application features:\n\n-\n**FastAPI** as the web framework, providing strong request-response data validation guarantees, automatic openAPI schema generation, and first-class async support, making it an excellent choice for building performant web APIs to serve AI workloads. -\n**OpenAI Agents SDK** for building our agentic workflow. It ships with all the bells and whistles you expect from an AI framework, such as sandbox mode, human in the loop mechanism, etc.OpenAI has published\n\n[extensive documentation](https://openai.github.io/openai-agents-python/)making it easy to get started, and the project has amassed about[27,000 GitHub stars](https://github.com/openai/openai-agents-python)at the time of writing. -\n**OpenAI Agents SDK Instrumentation**, which is not included via`opentelemetry-bootstrap`\n\ncommand, was added manually to ensure our workflow generates telemetry upon execution.\n\nThe [OpenAI Python SDK](https://github.com/openai/openai-python) is a completely fine choice for building production-ready LLM applications that don’t need the complete set of agentic orchestration capabilities.\n\nNote that the corresponding OTel instrumentation library currently lacks meaningful support for the newer Responses API, meaning you must rely on the Chat Completions API—a real handicap, since OpenAI recommends the Responses API for most projects due to its better API design, built-in agentic support, and improved cache utilization that cuts down on usage costs.\n\nThe OpenTelemetry GenAI team is hard at work, however: a PR for instrumenting the `create`\n\nmethod of the Responses API was [merged in April](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/4474). We should expect more targeted updates in the near future.\n\nThe Core Agentic Setup\n\nOur web application exposes an NBA Reporter agent that reports the latest NBA news given a topic (e.g., `general`\n\n, or `finals`\n\n), and performs on-demand analysis. When the user asks follow-up questions, our agent utilizes server sessions to analyze conversation history and provide context-aware output.\n\n**Defining the Agentic Workflow**\n\nThe [ agent_service.py](https://github.com/SigNoz/examples/blob/main/python/opentelemetry-llm-demo/app/agent_service.py) file defines the agent configuration and the executor function used in our FastAPI endpoint to serve user queries.\n\nWe initialize the agent with the system prompt, tools, input guardrails, and the model name. Since our agent is not expected to perform complex operations, we use the latest available version of GPT-5.4 Mini to minimize our usage costs without sacrificing the output quality.\n\n```\nNBA_AGENT = Agent(\n    name=\"NBA_Reporter\",\n    instructions=NBA_INTERACTIVE_PROMPT,\n    tools=[WebSearchTool(), calculate_win_percentage],\n    input_guardrails=[nba_content_guardrail],\n    model=OPENAI_MODEL,\n)\n```\n\nThe `calculate_win_percentage`\n\ntool helps the agent accurately calculate and display a team’s win percentage consistently.\n\n``` php\n@function_tool\ndef calculate_win_percentage(wins: int, losses: int) -> str:\n    \"\"\"Calculates the winning percentage for an NBA team given their wins and losses.\"\"\"\n\n    total_games = wins + losses\n    if total_games == 0:\n        return \".000\"\n\n    return f\"{wins / total_games:.3f}\".lstrip(\"0\")\n```\n\nIn our case, we use an input guardrail to limit discussions to basketball topics. On detecting a user query that does not contain any of the pre-defined keywords, our function returns a `GuardrailFunctionOutput`\n\nwith the `tripwire_triggered`\n\nparameter set to `True`\n\n.\n\n``` python\n@input_guardrail()\ndef nba_content_guardrail(\n    context: RunContextWrapper[None],\n    agent: Agent,\n    input_data: str | list,\n) -> GuardrailFunctionOutput:\n    \"\"\"Ensures the user query is relevant to basketball/NBA.\"\"\"\n\n    keywords = [\n        \"nba\",\n        \"basketball\",\n        \"player\",\n        ... # removed for brevity\n    ]\n\t\t\n    # extract the last user message from chat history\n    if isinstance(input_data, list):\n        latest_user_message = next(\n            (\n                item.get(\"content\", \"\")\n                for item in reversed(input_data)\n                if isinstance(item, dict) and item.get(\"role\") == \"user\"\n            ),\n            \"\",\n        )\n        input_query = latest_user_message\n    else:\n        input_query = input_data\n\n    input_query = input_query.lower()\n    is_relevant = any(keyword in input_query for keyword in keywords)\n\n    if len(input_query) < 5 or is_relevant:\n        return GuardrailFunctionOutput(tripwire_triggered=False, output_info=None)\n\n    return GuardrailFunctionOutput(\n        tripwire_triggered=True,\n        output_info={\n            \"reason\": \"The request is off-topic. Please ask questions relevant to NBA or basketball.\"\n        },\n    )\n```\n\nThe `tripwire_triggered=True`\n\nparameter signals the Agent to raise the `InputGuardrailTripwireTriggered`\n\nexception and stop the agent execution loop. We intercept these exceptions through the FastAPI exception handlers to record the exception event and return an appropriate response.\n\n```\n@app.exception_handler(InputGuardrailTripwireTriggered)\nasync def handle_guardrail_block(\n    request: Request,\n    exc: InputGuardrailTripwireTriggered,\n) -> JSONResponse:\n    guardrail_msg = exc.guardrail_result.output.output_info\n    span = trace.get_current_span()\n\n    span.record_exception(exc)\n    return JSONResponse(status_code=400, content={\"detail\": guardrail_msg})\n```\n\n**Running the Agent and Managing Session Context**\n\nThe `run_agent_turn`\n\nfunction completes our agentic implementation.\n\nThe [ OpenAIConversationsSession object](https://openai.github.io/openai-agents-python/sessions/) signals the workflow to leverage OpenAI-managed server sessions to maintain conversation context across turns. Each API response includes a\n\n`session_id`\n\nthat uniquely identifies the session context: user and agent responses, tool call metadata, etc.If the user includes this `session_id`\n\nin a follow-up request, the agent automatically retrieves the conversation context before processing the latest query, otherwise creating a new session to store the context for the current query.\n\n``` python\ndef run_agent_turn(\n    topic: str,\n    user_message: str | None,\n    session_id: str | None,\n) -> dict:\n    nba_topic = _validate_topic(topic)\n    prompt = build_nba_turn_prompt(nba_topic, user_message)\n\n    # if no session ID was given, the sdk internally creates a session_id during the turn\n    # subsequent calls which pass the ID maintain the conversation context\n    session = OpenAIConversationsSession(conversation_id=session_id)\n    result = Runner.run_sync(NBA_AGENT, prompt, session=session)\n\n    message = sanitize_agent_message((result.final_output or \"\").strip())\n\t\t... # skipped for brevity\n\n    return {\n        \"topic\": topic,\n        \"session_id\": session.session_id,\n        \"message\": message,\n        \"model\": OPENAI_MODEL,\n        \"usage\": usage,\n    }\n```\n\nVisualizing Agentic Workflows with OpenTelemetry\n\nInteracting with the Agent\n\nLet’s start by asking the NBA Reporter agent a specific question about the upcoming NBA finals.\n\nThe response contains the agent’s analysis of KAT’s recent performances for the New York Knicks, the token usage stats, and a unique `session_id`\n\n.\n\nLet’s attach the `session_id`\n\nto our follow-up question about this player. We will not explicitly refer to the player by name to see if the agent can access the existing context.\n\nGreat, the agent correctly identifies KAT from our previous query and returns the same `session_id`\n\n, confirming that it re-used the active session instead of creating a new one.\n\nNow, let’s see what happens if we ask it to answer an off-topic question, such as the weather in Barcelona.\n\nSince our agent is equipped with a web search tool, any user could potentially prompt the agent to run costly, time-intensive searches, or access malicious resources. As an LLM application developer, it falls on you to implement strict guardrails to prevent data leaks and exploits.\n\nExploring the Steps within the Agent Workflow\n\nNow, let’s see what the trace looks like for the successful request with the `session_id`\n\nin the payload, and compare it to the one that failed the guardrail check.\n\nExpanding the trace execution for the follow-up request, we can see that the workflow makes multiple API calls. Clicking on the initial GET and POST spans reveals the agent fetching and most likely saving conversation data, respectively.\n\nWithin the `invoke_agent`\n\nspan, we can see that the guardrail check has been documented as well, capturing the guardrail function name and whether the guardrail was triggered.\n\nThe span with the model name stores the conversation history and the last model output for the `gen_ai.input.messages`\n\nand `gen_ai.output.messages`\n\nspan attributes.\n\nWhile helpful for debugging, these span attributes can be incredibly verbose and may contain sensitive PII. To ensure user input and output content capture is explicitly disabled, you can use the following environment variable:\n\n```\nexport OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=0\n```\n\nOpening the detailed view of our failed trace, we can see that the root span links to the exception event we captured in our FastAPI exception handler. The `gen_ai.guardrail.triggered`\n\nattribute on the `guardrail_check`\n\nspan is set to `true`\n\n, clearly indicating that the guardrail in question blocked further execution.\n\nYou may have noticed that multiple spans in the trace tree are simply titled `unknown`\n\n. These represent certain internal API paths that haven’t been fully mapped out as part of the current instrumentation process.\n\nHowever, you can still deduce what is happening within these spans by referring to their child spans, which are correctly labelled (like the `guardrail_check`\n\nspan).\n\nThere is an [open PR](https://github.com/open-telemetry/opentelemetry-python-genai/issues/86) addressing many such conformance issues for the Agents SDK, so we can expect significant improvements in the near future.\n\n**Monitoring LLM Usage and Agent Performance**\n\nWhile we focused primarily on traces to map out our agent's lifecycle, the instrumentation wrapper simultaneously exposes core metrics like `gen_ai.client.operation.duration`\n\nand `gen_ai.client.token.usage`\n\nout of the box, useful for building dashboards on token spend and call volumes over time.\n\nYou can also import the SigNoz [dashboard template](https://signoz.io/docs/dashboards/dashboard-templates/openai-dashboard/) for the OpenAI Python SDK. It is highly compatible with our LLM application, except for the cache utilization metrics, which aren’t yet exported by the current Agents SDK instrumentation path.\n\nWrapping Up\n\nWith this guide, you now have the fundamental knowledge required to begin instrumenting your LLM-based applications with OpenTelemetry.\n\nWe began by exploring the unique observability challenges introduced by non-deterministic models and complex agentic workflows. From there, we wired up a FastAPI application using the OpenAI Agents SDK, navigated the developing GenAI semantic conventions, and visualized how traces reveal exactly what happens under the hood.\n\n[SigNoz](https://signoz.io/) is an OpenTelemetry-native platform that visualizes traces, metrics, and logs in a single pane, making it incredibly easy to debug complex agentic loops and monitor token spend across your AI deployments.\n\nIf you’re interested in trying out SigNoz for your LLM applications, [sign up](https://signoz.io/teams/) for a 30-day free trial (no credit card required).", "url": "https://wpnews.pro/news/observing-llm-applications-with-opentelemetry", "canonical_source": "https://signoz.io/blog/opentelemetry-llm", "published_at": "2026-06-03 00:00:00+00:00", "updated_at": "2026-06-05 06:53:26.344481+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "generative-ai", "mlops", "ai-tools"], "entities": ["OpenAI", "ChatGPT", "OpenTelemetry"], "alternates": {"html": "https://wpnews.pro/news/observing-llm-applications-with-opentelemetry", "markdown": "https://wpnews.pro/news/observing-llm-applications-with-opentelemetry.md", "text": "https://wpnews.pro/news/observing-llm-applications-with-opentelemetry.txt", "jsonld": "https://wpnews.pro/news/observing-llm-applications-with-opentelemetry.jsonld"}}