cd /news/ai-agents/opentelemetry-tells-you-what-your-ag… Β· home β€Ί topics β€Ί ai-agents β€Ί article
[ARTICLE Β· art-45646] src=dev.to β†— pub= topic=ai-agents verified=true sentiment=↑ positive

OpenTelemetry Tells You What Your Agent Did. Not Whether It Was OK.

A developer built Ballast, an open-source reliability layer for AI agents that sits on top of OpenTelemetry. While OpenTelemetry captures telemetry like token counts and latency, Ballast adds reliability semantics such as guardrail results and trace completeness. The project is available on GitHub as a TypeScript SDK with a local JSONL store and CLI viewer.

read3 min views1 publishedJun 30, 2026

OpenTelemetry's GenAI conventions will tell you your agent called Claude, spent 1,843 input tokens, took 900 milliseconds, and returned without an error. They will not tell you the answer cited zero sources, that the loop spun nineteen times before it gave up, or that the model never saw the guardrail that was supposed to stop it. Those are the facts that decide whether an agent is safe to run unattended. No standard layer captures them.

So I built a small one. ballast sits on top of OpenTelemetry: OTel tells you what happened; ballast tells you whether it was acceptable.

OTel already owns the telemetry substrate β€” provider, model, token counts, latency, status. That problem is solved, and solved as a standard. ballast doesn't touch it. What it adds is the reliability layer, expressed as ballast.*

attributes and events riding on the same gen_ai.*

spans:

You instrument an existing call by wrapping it. Nothing about your stack changes:

import { wrap, evidenceGuardrail } from '@michaeltuszynski/ballast';

const answer = await wrap(
  { name: 'gen_ai.chat', system: 'anthropic', model: 'claude-sonnet-4-5' },
  async (ctx) => {
    const res = await callYourModel();
    ctx.setUsage(res.inputTokens, res.outputTokens, res.costUsd);
    ctx.guardrail(evidenceGuardrail(res.text));
    return res.text;
  },
);

wrap

opens a real OTel span, lets you record usage and reliability results onto it, and exports a protocol-conformant record to a runs.jsonl

. Then ballast runs

reads it back.

The first design had ballast defining its own trace schema β€” provider, model, tokens, the works. I had a second model review the spec before I wrote a line of code, and it caught the mistake in one paragraph: OpenTelemetry already standardizes all of that. Reinventing it would have put ballast in a fight it can't win against a convention with a working group behind it.

So the protocol got rebuilt on the OTel GenAI semantic conventions, and ballast's surface shrank to the one thing nobody standardizes: reliability semantics. That review is why the repo exists in the shape it does. The lesson generalizes β€” the substrate is rarely the greenfield you assume it is.

ballast is narrow, and staying narrow is the point.

It's not an agent framework. No chains, no memory, no tool execution, no orchestration. Bring your own runtime β€” Claude Code, the raw SDK, LangChain β€” and wrap the calls. The moment a reliability layer grows an orchestration engine, it stops being a reliability layer.

It's not a tracing backend. If you only need raw LLM telemetry, use OpenTelemetry, Langfuse, or OpenLLMetry directly. ballast emits OTel; it doesn't replace your collector.

And it doesn't pretend to see everything. Wrapping arbitrary agent code means hidden retries, streaming partials, and tool calls can slip past the instrumentation. A reliability layer that reports an incomplete trace as complete is worse than no layer β€” it manufactures confidence. So every span carries a ballast.trace.completeness

flag, and each adapter declares what it can actually observe. "Partial" is a first-class answer.

The contracts-guardrails-bounded-loops discipline isn't theoretical. It's what kept agent platforms I've run in production from drifting β€” the difference between an agent that ships a clean statement of work and one that quietly invents a clause nobody catches until a customer does. ballast is that discipline pulled out of internal tooling and rebuilt as something standards-based and small enough to drop into anyone's stack.

This is the MVP: a TypeScript SDK, the protocol, a local JSONL store, and a CLI viewer. The Python SDK and eval-as-gates β€” running a prompt across several models and gating on the result β€” are the next slices, and the schema already carries them.

The repo is MIT, thirty tests, built on OTel. Clone it, run npm run example

, and watch a span land in ballast runs

. Then wrap one of your own calls and see what your traces haven't been telling you.

── more in #ai-agents 4 stories Β· sorted by recency
── more on @ballast 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/opentelemetry-tells-…] indexed:0 read:3min 2026-06-30 Β· β€”