Reading Observability Tools? That's a Robot's Job

wpnews.pro

Last Thursday Honeycomb was kind enough to hand me the closing keynote slot at O11yCon, a conference whose entire premise is that observability matters and the people in the room are the ones who would know. I used that slot to argue that the primary reader of your telemetry is no longer a human, and that almost every design decision in the observability stack (including the framework most of the industry uses to describe its own job) was built for a reader who isn't sitting in the chair anymore.

I should probably explain a bit.

How We Got Here #

About two weeks before the talk, my co-founder Mike Julian—the other half of Duckbill, the company that pays me to have opinions about cloud bills—went looking on the Anthropic dashboard for the per-user cost view that exists on every other SaaS product humans have ever bought. He could not find it. He asked the internet.

The answer came back from someone in customer-facing engineering at Anthropic, and it was… less than it could have been. There is no per-user cost view. Plans are flat per-seat. If you want cost estimates per user, you can—and I want you to feel the weight of this phrase the way I did when I read it—"hook up OTel."

The answer to a billing question was: get the OpenTelemetry SDK, wire up a collector, pipe the metrics into a backend, and build the dashboard you wish we had.

This is not the fault of the engineer who replied. She gave the customer-service-approved answer her org had set her up to give. The answer her company put in her mouth, though, is the most interesting / horrifying thing I've read on X The Everything App® this year, because it's going to be the right answer more often, not less. And that's not because vendors are lazy, but because the questions are getting harder, faster than the dashboards can keep up.

So everyone in that room—and most of you reading this—is about to wire up a lot of OTel.

The rest of the talk was about what happens after you do.

The Three Pillars, Translated Into English #

Charity Majors has been arguing for the better part of a decade that "metrics, logs, and traces" is a marketing framework that observability vendors love and engineers should be suspicious of. She is, as is her habit, correct. Austin Parker made the same argument from the OpenTelemetry side last year: the three-pillars framing treats telemetry as three siloed signal streams when the whole point of OTel is that they're unified by shared distributed context. Honeycomb has been saying out loud, for a while now, that the pillars metaphor was architecturally wrong from the start.

My argument is narrower and meaner: even if you didn't buy that one on the merits, the audience for telemetry has changed in a way that breaks the framing for a second, independent reason. So now it's wrong twice.

Put more bluntly than the marketing pages can:

Metrics are for the human who wants to confirm a hunch.Logs are for the human who has run out of better ideas.Traces are for the human who finally gave up on the first two.

All three were designed for the same reader: A human. With eyes, intuition, and the ability to skim a dashboard and notice that the orange line is doing something the orange line should absolutely not be doing. And critically: with a finite amount of time, which means the visualization mattered as much as the underlying data.

That reader has been quietly replaced, and the seat is mostly empty now.

What's increasingly reading the telemetry is a process. It does not have eyes. And thus it can't skim nor tell at a glance that the orange line going up is bad and the green line going up is good. It does not understand your color scheme, a surprisingly common sentiment given most companies' "interesting" choices regarding color palette. It cannot appreciate the dashboard you cleverly built that arranges six panels in a way that tells a story to a senior engineer (who was likewise unappreciative of your hard work, but that's senior engineers for you).

But it can read attributes, follow trace IDs, connect dots, and of course it can grep, very quickly, across an enormous amount of structured data.

That is a profile of a reader, but notably it is not the reader any of these tools were built to serve.

The Two Pillars That Lose #

Metrics: it's effectively an agentic dead end. A graph that says "errors went up at 3pm" is useful information for a human, who looks at it, mutters something uncharitable about whoever was on deploy duty, and goes to investigate. For an agent, the same graph is the end of the trail. It is an aggregate and thus inherently lossy. The shape of the data the agent could have followed has been crushed flat by a mathematical wheat thresher into a count. The metric does not remember which spans contributed to it. Now sure, metrics will continue to page you, which is arguably useful though annoying. They will continue to anchor SLOs, which is also useful. But they're no longer the front door.

Logs: the free-text confessional optimized for the moment a human gave up trying to find structure. Structured spans already have everything the logs were meant to surface—and they are typed, queryable, and have a known schema instead of an emergent one. I haven't opened a log dashboard in three weeks. If you haven't either and have been quietly feeling guilty about it, I am giving you permission to knock that off. You are not lazy. You are early.

The Pillar That Wins, And How You're About to Break It #

Traces win. That is the entire thesis of the talk, because everything else just got unceremoniously demoted.

Which brings me to the paragraph I want every observability vendor and every engineer instrumenting a service to read twice:

Your span names and attribute names are now a public API. They have consumers. The consumers are not patient when you ship breaking changes. If you rename fast_path_hit

to fast_path_taken

in a refactor PR, every prompt anywhere in your system that mentions fast_path_hit

now silently disagrees with reality. There is no compiler that will catch that, because the compiler thinks these are just strings in a JSON blob. The agent reading them is not your patient senior engineer who can figure it out from context. The agent will be confidently wrong instead.

If you wouldn't ship a rename to your public API without a changelog, don't do it to your traces. Version them. Review them. Stabilize them. The runbook is not the documentation anymore, because the trace structure is. And while we're upending best practices that've outlived their times: the traditional cardinality story—keep your high-cardinality fields under control, watch your unique-value counts—was a constraint we invented because dashboards have screens and screens have pixels and a graph with ten thousand unique series on it is unreadable to a person.

The agent does not look at the graph. The agent reads the underlying spans. A high-cardinality field—say, the sender domain on every email a service processes—was previously a thing I would have suppressed, because a chart broken down by sender domain was useless. Now, when Claude is trying to figure out why a specific person's emails are failing, that field is the single most useful attribute on the trace.

High cardinality, when the reader is an agent, is more valuable, not less. We have been pruning the parts of our telemetry that were most useful to the new reader, for the comfort of the old one.

(Charity has been saying the "wide events, high cardinality, ask any question" version of this for years. The reframing I'd offer is that the argument has gone from a best practice to a survival trait. The people who took her advice seriously have less work to do this year than the people who didn't.)

The Receipt #

I run sixteen Honeycomb datasets across a Kubernetes cluster in my sewing room, a fleet of Lambdas, an AWS account called Superfund (which is both toxic and expensive), and some Vercel services—none of which I wrote a line of Terraform for. Claude wrote and operates all of it. The only reader looking at the telemetry most of the time is also Claude, sometimes with me screaming at it.

The day a Honeycomb engineer came in to clean up my account, the first thing she found was 9,176—the number of times in twenty-four hours a Kubernetes liveness probe fired into my pipeline so a load balancer could feel seen. I was paying to ingest the world's most boring conversation between a cluster and itself.

I had built the observability equivalent of a stadium where everyone had hiccups. And the reader I'd built it for had already left the building.

The Closer #

The next twenty years of observability are not going to be about prettier dashboards. They are going to be about what your trace structure says to a non-human reader at three in the morning, when you are asleep, and your agent is the only thing standing between your customers and an outage you will hear about on Monday.

Build for that reader. Document for that reader. Stop renaming your spans in freaking refactor PRs. And treat your cardinality budget as something you spend on that reader's ability to follow the thread, not something you starve to keep a graph readable.

I don't use Honeycomb anymore. My AI does.

Try not to take it personally.

source & further reading

lastweekinaws.com — original article