cd /news/large-language-models/observing-llm-applications-with-open… · home topics large-language-models article
[ARTICLE · art-22273] src=signoz.io pub= topic=large-language-models verified=true sentiment=· neutral

Observing LLM Applications with OpenTelemetry

OpenTelemetry, an open-source observability framework, is being adopted to monitor non-deterministic outputs and performance issues in LLM-based applications. The technology addresses challenges like hallucinations, inconsistent responses, and provider-side latency spikes that arise when integrating large language models into production systems. Developers can use OpenTelemetry's standardized instrumentation to collect telemetry data without vendor lock-in, enabling backend-agnostic monitoring of AI features.

read13 min publishedJun 3, 2026

Ever since OpenAI launched ChatGPT in November 2022, AI usage has exploded worldwide. Integrating LLMs into applications began soon after, rapidly going from an experimental, nice-to-have feature to a competitive, baseline requirement.

And while you can find an AI implementation in almost every product today, shipping production-ready LLM features introduces its own set of challenges that developers must contend with.

In this article, we’ll dive into why observing LLM-based applications is now a critical requirement, what OpenTelemetry is, and how to integrate it into your applications with a practical demo.

During this process, we will also look at the current maturity level of LLM-specific OpenTelemetry libraries, the GenAI Semantic Conventions, and some practical challenges you can face while instrumenting your LLM applications.

Why do LLM applications need observability?

If you are already familiar with the challenges of maintaining LLM applications across their lifecycle, feel free to skip to the next section that discusses OpenTelemetry.

Handling non-determinism

Now you might think that observing your LLM integrations is not that different from classic observability. The key difference is that the output generated by LLMs is non-deterministic: the same input can produce completely different outputs across runs.

Developers often equip LLMs with dedicated tools since models can hallucinate unpredictably on tasks that require precise, deterministic output.

Ensuring context-appropriate responses

Non-determinism does not mean that the responses are actually incorrect. In most scenarios though, developers likely want their responses to be structured in a certain way.

For example, while the response "very likely" for a query like "chances of rain tomorrow" might be suitable, the same response for a query like "chances of stock market climbing tomorrow" might be unacceptable, where the user likely expects more nuance from the application system.

Ensuring that responses remain consistent across a range of user queries is one of the key factors that separates a polished LLM product from an unreliable one.

Managing quality across updates

LLM providers frequently release model updates, modify their backends, and provide optimal usage guides. Meanwhile, developers also experiment with model configurations and share the ones which work for them. All in all, the space is developing quickly, and each of these factors can affect the response quality of your LLM setup.

As a practical example, LLM providers can suffer "brown-outs" where their infrastructure cannot keep up with user demand, leading to latency spikes, timeouts, or even degraded response quality in certain scenarios, making it critical to observe how your LLM setup holds up over time.

What is OpenTelemetry?

OpenTelemetry (OTel) is a Cloud Native Computing Foundation (CNCF) project aimed at standardizing the way we instrument applications for generating telemetry data. Before OpenTelemetry arrived, telemetry data lived in silos and often had little or no correlation between signals.

It follows a specification-driven development model that standardizes telemetry generation and collection details, meaning any compatible backend can process and visualize telemetry data emitted via its SDKs.

As there is no need to rewrite the entire instrumentation plumbing each time you change observability backends, there is no vendor lock-in.

Implementing OpenTelemetry in LLM Applications

Prerequisites

While earlier Python versions like 3.10 may technically work, they are nearing their end of life. Python 3.12 will continue to receive security updates till late 2028.

Setting up SigNoz

SigNoz is an OpenTelemetry-native observability platform that provides logs, traces, and metrics in a unified platform.

Sign upfor a free SigNoz Cloud account.Follow the documentationto create ingestion keys for your account.- Ensure the region and ingestion key values are readily accessible for the following steps.

Once done, you’re ready to configure the application and point it towards your SigNoz instance.

Running the Demo Application

Application Setup

Clone the SigNoz Examples repository and navigate to the application folder:

git clone https://github.com/SigNoz/examples.git
cd examples/python/opentelemetry-llm-demo

Create and activate a Python virtual environment.

python3.12 -m venv .venv
source .venv/bin/activate

The requirements.txt

file contains all the necessary OpenTelemetry Python packages. Install them by running:

python -m pip install -r requirements.txt

The following dependencies enable the OpenTelemetry instrumentation process:

opentelemetry-distro

: This provides a convenient mechanism to automatically configure some of the more common options for users, helping us get started with OpenTelemetry auto-instrumentation quickly.opentelemetry-exporter-otlp

: This package installsthe OTLPexporters required to transmit telemetry data to anyOpenTelemetry backend.

The following command detects standard libraries or frameworks (such as FastAPI) used in our application, and installs their respective instrumentation libraries:

opentelemetry-bootstrap --action=install

Finally, we will configure our environment variables and start the application, wrapping the entrypoint within opentelemetry-instrument

to auto-instrument our application code.

OPENAI_API_KEY="<your-openai-api-key>" \
OTEL_EXPORTER_OTLP_ENDPOINT="https://ingest.<your-region>.signoz.cloud:443" \
OTEL_EXPORTER_OTLP_HEADERS="signoz-ingestion-key=<your-ingestion-key>" \
OTEL_SERVICE_NAME="opentelemetry-llm-demo" \
OTEL_RESOURCE_ATTRIBUTES="service.version=0.1.0,deployment.environment=dev" \
OTEL_EXPORTER_OTLP_PROTOCOL="http/protobuf" \
OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true \
opentelemetry-instrument fastapi run --port 8085 --workers 1

Replace the <your-region>

and the your-ingestion-key

placeholders with the region of your SigNoz workspace (e.g., us

, in

) and your newly created ingestion key. You will also need to supply your OpenAI API key.

OTEL_RESOURCE_ATTRIBUTES

defines the metadata attached to each batch of telemetry that goes out of our application, and the service name opentelemetry-llm-demo

ensures the OTel backend can correctly identify the telemetry source.

Setting OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED

to true

ensures that application logs are exported alongside and correlated with the generated traces.

Run the command, and the FastAPI server should start on port 8085. Calling the /

endpoint will return a simple success response.

(.venv)  ❯ curl "http://127.0.0.1:8085/"
{"message":"OpenTelemetry NBA agent demo is running"}

Before we move ahead, let’s look at how the OpenTelemetry community is standardizing telemetry standards, capture, and export processes for AI-driven applications.

Evolving GenAI Standards

Throughout this article, we’ve used the term "LLMs", but as become familiar with OpenTelemetry, you will see the term Generative AI

(or GenAI

) being used almost exclusively. OpenTelemetry uses the umbrella term GenAI

to refer to any application utilizing AI models.

Later when we use our application, you’ll see that all telemetry attributes and metrics ingested by SigNoz have the gen_ai.*

prefix, such as gen_ai.agent.name.

It is crucial to understand that these standards are evolving rapidly and are still technically in Development status. Because of this fast-paced development, and the criticality of the GenAI project, the OpenTelemetry team recently created a separate, dedicated repository to maintain GenAI instrumentation libraries.

This "immaturity" has a real-world impact on developers. For example, the opentelemetry-bootstrap

tool currently doesn’t recognize or install the OpenAI Agents SDK that we’ll be using in our application.

To counter this, we have manually added the corresponding entry to the requirements.txt file.

Further, you will often find that the current GenAI instrumentation libraries struggle to maintain full coverage for these rapidly changing AI implementations. We will explore this in more detail when we analyze our app’s trace output.

Dissecting the OpenTelemetry LLM Application

Now let's go over the key implementation details to fully understand how the agentic workflow has been wired up. Skip to the next section if you want to jump directly to the interactive part.

Dependency Choices

Besides the essential OpenTelemetry dependencies we’ve already discussed above, the application features:

FastAPI as the web framework, providing strong request-response data validation guarantees, automatic openAPI schema generation, and first-class async support, making it an excellent choice for building performant web APIs to serve AI workloads. - OpenAI Agents SDK for building our agentic workflow. It ships with all the bells and whistles you expect from an AI framework, such as sandbox mode, human in the loop mechanism, etc.OpenAI has published

extensive documentationmaking it easy to get started, and the project has amassed about27,000 GitHub starsat the time of writing. - OpenAI Agents SDK Instrumentation, which is not included viaopentelemetry-bootstrap

command, was added manually to ensure our workflow generates telemetry upon execution.

The OpenAI Python SDK is a completely fine choice for building production-ready LLM applications that don’t need the complete set of agentic orchestration capabilities.

Note that the corresponding OTel instrumentation library currently lacks meaningful support for the newer Responses API, meaning you must rely on the Chat Completions API—a real handicap, since OpenAI recommends the Responses API for most projects due to its better API design, built-in agentic support, and improved cache utilization that cuts down on usage costs.

The OpenTelemetry GenAI team is hard at work, however: a PR for instrumenting the create

method of the Responses API was merged in April. We should expect more targeted updates in the near future.

The Core Agentic Setup

Our web application exposes an NBA Reporter agent that reports the latest NBA news given a topic (e.g., general

, or finals

), and performs on-demand analysis. When the user asks follow-up questions, our agent utilizes server sessions to analyze conversation history and provide context-aware output.

Defining the Agentic Workflow

The agent_service.py file defines the agent configuration and the executor function used in our FastAPI endpoint to serve user queries.

We initialize the agent with the system prompt, tools, input guardrails, and the model name. Since our agent is not expected to perform complex operations, we use the latest available version of GPT-5.4 Mini to minimize our usage costs without sacrificing the output quality.

NBA_AGENT = Agent(
    name="NBA_Reporter",
    instructions=NBA_INTERACTIVE_PROMPT,
    tools=[WebSearchTool(), calculate_win_percentage],
    input_guardrails=[nba_content_guardrail],
    model=OPENAI_MODEL,
)

The calculate_win_percentage

tool helps the agent accurately calculate and display a team’s win percentage consistently.

@function_tool
def calculate_win_percentage(wins: int, losses: int) -> str:
    """Calculates the winning percentage for an NBA team given their wins and losses."""

    total_games = wins + losses
    if total_games == 0:
        return ".000"

    return f"{wins / total_games:.3f}".lstrip("0")

In our case, we use an input guardrail to limit discussions to basketball topics. On detecting a user query that does not contain any of the pre-defined keywords, our function returns a GuardrailFunctionOutput

with the tripwire_triggered

parameter set to True

.

@input_guardrail()
def nba_content_guardrail(
    context: RunContextWrapper[None],
    agent: Agent,
    input_data: str | list,
) -> GuardrailFunctionOutput:
    """Ensures the user query is relevant to basketball/NBA."""

    keywords = [
        "nba",
        "basketball",
        "player",
        ... # removed for brevity
    ]
		
    if isinstance(input_data, list):
        latest_user_message = next(
            (
                item.get("content", "")
                for item in reversed(input_data)
                if isinstance(item, dict) and item.get("role") == "user"
            ),
            "",
        )
        input_query = latest_user_message
    else:
        input_query = input_data

    input_query = input_query.lower()
    is_relevant = any(keyword in input_query for keyword in keywords)

    if len(input_query) < 5 or is_relevant:
        return GuardrailFunctionOutput(tripwire_triggered=False, output_info=None)

    return GuardrailFunctionOutput(
        tripwire_triggered=True,
        output_info={
            "reason": "The request is off-topic. Please ask questions relevant to NBA or basketball."
        },
    )

The tripwire_triggered=True

parameter signals the Agent to raise the InputGuardrailTripwireTriggered

exception and stop the agent execution loop. We intercept these exceptions through the FastAPI exception handlers to record the exception event and return an appropriate response.

@app.exception_handler(InputGuardrailTripwireTriggered)
async def handle_guardrail_block(
    request: Request,
    exc: InputGuardrailTripwireTriggered,
) -> JSONResponse:
    guardrail_msg = exc.guardrail_result.output.output_info
    span = trace.get_current_span()

    span.record_exception(exc)
    return JSONResponse(status_code=400, content={"detail": guardrail_msg})

Running the Agent and Managing Session Context

The run_agent_turn

function completes our agentic implementation.

The OpenAIConversationsSession object signals the workflow to leverage OpenAI-managed server sessions to maintain conversation context across turns. Each API response includes a

session_id

that uniquely identifies the session context: user and agent responses, tool call metadata, etc.If the user includes this session_id

in a follow-up request, the agent automatically retrieves the conversation context before processing the latest query, otherwise creating a new session to store the context for the current query.

def run_agent_turn(
    topic: str,
    user_message: str | None,
    session_id: str | None,
) -> dict:
    nba_topic = _validate_topic(topic)
    prompt = build_nba_turn_prompt(nba_topic, user_message)

    session = OpenAIConversationsSession(conversation_id=session_id)
    result = Runner.run_sync(NBA_AGENT, prompt, session=session)

    message = sanitize_agent_message((result.final_output or "").strip())
		... # skipped for brevity

    return {
        "topic": topic,
        "session_id": session.session_id,
        "message": message,
        "model": OPENAI_MODEL,
        "usage": usage,
    }

Visualizing Agentic Workflows with OpenTelemetry

Interacting with the Agent

Let’s start by asking the NBA Reporter agent a specific question about the upcoming NBA finals.

The response contains the agent’s analysis of KAT’s recent performances for the New York Knicks, the token usage stats, and a unique session_id

.

Let’s attach the session_id

to our follow-up question about this player. We will not explicitly refer to the player by name to see if the agent can access the existing context.

Great, the agent correctly identifies KAT from our previous query and returns the same session_id

, confirming that it re-used the active session instead of creating a new one.

Now, let’s see what happens if we ask it to answer an off-topic question, such as the weather in Barcelona.

Since our agent is equipped with a web search tool, any user could potentially prompt the agent to run costly, time-intensive searches, or access malicious resources. As an LLM application developer, it falls on you to implement strict guardrails to prevent data leaks and exploits.

Exploring the Steps within the Agent Workflow

Now, let’s see what the trace looks like for the successful request with the session_id

in the payload, and compare it to the one that failed the guardrail check.

Expanding the trace execution for the follow-up request, we can see that the workflow makes multiple API calls. Clicking on the initial GET and POST spans reveals the agent fetching and most likely saving conversation data, respectively.

Within the invoke_agent

span, we can see that the guardrail check has been documented as well, capturing the guardrail function name and whether the guardrail was triggered.

The span with the model name stores the conversation history and the last model output for the gen_ai.input.messages

and gen_ai.output.messages

span attributes.

While helpful for debugging, these span attributes can be incredibly verbose and may contain sensitive PII. To ensure user input and output content capture is explicitly disabled, you can use the following environment variable:

export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=0

Opening the detailed view of our failed trace, we can see that the root span links to the exception event we captured in our FastAPI exception handler. The gen_ai.guardrail.triggered

attribute on the guardrail_check

span is set to true

, clearly indicating that the guardrail in question blocked further execution.

You may have noticed that multiple spans in the trace tree are simply titled unknown

. These represent certain internal API paths that haven’t been fully mapped out as part of the current instrumentation process.

However, you can still deduce what is happening within these spans by referring to their child spans, which are correctly labelled (like the guardrail_check

span).

There is an open PR addressing many such conformance issues for the Agents SDK, so we can expect significant improvements in the near future.

Monitoring LLM Usage and Agent Performance

While we focused primarily on traces to map out our agent's lifecycle, the instrumentation wrapper simultaneously exposes core metrics like gen_ai.client.operation.duration

and gen_ai.client.token.usage

out of the box, useful for building dashboards on token spend and call volumes over time.

You can also import the SigNoz dashboard template for the OpenAI Python SDK. It is highly compatible with our LLM application, except for the cache utilization metrics, which aren’t yet exported by the current Agents SDK instrumentation path.

Wrapping Up

With this guide, you now have the fundamental knowledge required to begin instrumenting your LLM-based applications with OpenTelemetry.

We began by exploring the unique observability challenges introduced by non-deterministic models and complex agentic workflows. From there, we wired up a FastAPI application using the OpenAI Agents SDK, navigated the developing GenAI semantic conventions, and visualized how traces reveal exactly what happens under the hood.

SigNoz is an OpenTelemetry-native platform that visualizes traces, metrics, and logs in a single pane, making it incredibly easy to debug complex agentic loops and monitor token spend across your AI deployments.

If you’re interested in trying out SigNoz for your LLM applications, sign up for a 30-day free trial (no credit card required).

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/observing-llm-applic…] indexed:0 read:13min 2026-06-03 ·