Per-user cost attribution for your AI APP

To track AI API costs per individual user by attaching a `userId` tag to every LLM call. It presents three methods: using wrapper SDKs like `@voightxyz/openai` with `withTrace`, leveraging the Vercel AI SDK's `experimental_telemetry.metadata`, or manually emitting events for background workers. The key insight is that tagging requests at the boundary allows cost attribution to propagate automatically, enabling developers to identify which users drive their OpenAI or Anthropic bills.

You ship your AI feature. It works. A week later your OpenAI bill is $400 and you have no idea which of your users caused which $0.05. This is the single most underrated metric in production LLM apps — cost per end-user — and it's surprisingly easy to instrument if you know what to do. Here are the three approaches I've found work in practice, ranked by setup time. Approach 1: Wrap your provider client 5 minutes Works for Express, Next.js Route Handlers, Fastify — anything that has a single OpenAI or Anthropic client instance. python import OpenAI from 'openai' import { wrapOpenAI, withTrace } from '@voightxyz/openai' const openai = wrapOpenAI new OpenAI , { agent: 'production-chat-api', } app.post '/api/chat', async req, res = { await withTrace async = { const r = await openai.chat.completions.create { model: 'gpt-4o-mini', messages: req.body.messages, } res.json { reply: r.choices 0 .message } }, { routeTag: 'POST /api/chat', tags: { userId: req.user.id, plan: req.user.plan, }, }, } The trick is withTrace { tags: { userId } } at the request boundary. Every LLM call inside the block — direct or nested — inherits those tags automatically via AsyncLocalStorage . You don't have to thread userId through every function. Pros: simplest. Pros: works with both OpenAI and Anthropic the same way. Cons: requires you to use the dedicated wrapper SDKs. Approach 2: OpenTelemetry telemetry metadata Vercel AI SDK If you're on the Vercel AI SDK, experimental telemetry.metadata is the equivalent hook: js import { openai } from '@ai-sdk/openai' import { streamText } from 'ai' export async function POST req: Request { const result = streamText { model: openai 'gpt-4o-mini' , prompt: await req.json .prompt, experimental telemetry: { isEnabled: true, metadata: { userId: session.user.id, plan: session.user.plan, }, }, } return result.toAIStreamResponse } This lifts onto ai.telemetry.metadata.<key span attributes that any OpenTelemetry-compatible observability tool Langfuse, Phoenix, Voight, Braintrust, Datadog picks up. Pros: zero coupling — pure OTel, swap exporters whenever. Cons: only works if your SDK emits OTel spans. AI SDK does. Many others don't yet. Approach 3: Raw event emission autonomous bots / non-HTTP For background workers, agents calling LLMs in loops, or anything that doesn't have a request boundary — emit events manually: js import { Voight } from '@voightxyz/sdk' const voight = new Voight { agentId: 'my-bot' } const t0 = Date.now const res = await fetch 'https://api.openai.com/v1/chat/completions', { method: 'POST', headers: { authorization: Bearer ${process.env.OPENAI API KEY} }, body: JSON.stringify { model: 'gpt-4o-mini', messages: ... , } , } .then r = r.json voight.log { type: 'reasoning', model: 'gpt-4o-mini', durationMs: Date.now - t0, outcome: 'success', metadata: { tokens: { input: res.usage.prompt tokens, output: res.usage.completion tokens, }, tags: { userId: job.userId, tenantId: job.tenantId, }, }, } This is more code per call, but you control everything. Useful when the LLM call doesn't fit cleanly inside a wrapper e.g. you're proxying through your own router . Pros: full control over what gets emitted. Cons: more boilerplate. You're responsible for token counting. What you can answer once userId is in your tags Once tags.userId or whatever you name it is on every event, the questions you can answer change shape: You don't need a separate analytics SDK on the client. You don't need to copy userId into LLM messages. You don't need anything custom on top — the tags propagate from the request boundary down to every span. A note on GDPR / multi-tenant safety userId here means your internal stable identifier — user a3f9c2 or whatever — not the user's email or wallet. Never put PII into telemetry metadata. The good observability tools scrub PII anyway, but garbage-in is still garbage. For multi-tenant SaaS, add a second tag: tags: { userId, tenantId } . That way you can ask both "which customer is this?" and "which of their users?". Wrapping up Three approaches, one mental model: stamp userId at the boundary, let it propagate to every LLM call inside the request. The wrappers I used here are Apache 2.0: - @voightxyz/openai for OpenAI - @voightxyz/anthropic for Anthropic - @voightxyz/vercel-ai for the Vercel AI SDK - @voightxyz/sdk for library mode Same approach works with Langfuse, Phoenix, Braintrust, or your existing OTel pipeline — the metadata.userId pattern is the universal part. How do you currently track per-user spend in your AI app? Stripe metering? Server logs? Or have you been flying blind?