{"slug": "per-user-cost-attribution-for-your-ai-app", "title": "Per-user cost attribution for your AI APP", "summary": "To track AI API costs per individual user by attaching a `userId` tag to every LLM call. It presents three methods: using wrapper SDKs like `@voightxyz/openai` with `withTrace`, leveraging the Vercel AI SDK's `experimental_telemetry.metadata`, or manually emitting events for background workers. The key insight is that tagging requests at the boundary allows cost attribution to propagate automatically, enabling developers to identify which users drive their OpenAI or Anthropic bills.", "body_md": "You ship your AI feature. It works. A week later your OpenAI bill is $400 and you have no idea which of your users caused which $0.05.\nThis is the single most underrated metric in production LLM apps — cost per end-user — and it's surprisingly easy to instrument if you know what to do.\nHere are the three approaches I've found work in practice, ranked by setup time.\nApproach 1: Wrap your provider client (5 minutes)\nWorks for Express, Next.js Route Handlers, Fastify — anything that has a single OpenAI or Anthropic client instance.\n\n``` python\nimport OpenAI from 'openai'\nimport { wrapOpenAI, withTrace } from '@voightxyz/openai'\n\nconst openai = wrapOpenAI(new OpenAI(), {\n  agent: 'production-chat-api',\n})\n\napp.post('/api/chat', async (req, res) => {\n  await withTrace(\n    async () => {\n      const r = await openai.chat.completions.create({\n        model: 'gpt-4o-mini',\n        messages: req.body.messages,\n      })\n      res.json({ reply: r.choices[0].message })\n    },\n    {\n      routeTag: 'POST /api/chat',\n      tags: {\n        userId: req.user.id,\n        plan: req.user.plan,\n      },\n    },\n  )\n})\n```\n\nThe trick is withTrace({ tags: { userId } })\nat the request boundary. Every LLM call inside the block — direct or nested — inherits those tags automatically via AsyncLocalStorage\n. You don't have to thread userId\nthrough every function.\nPros: simplest. Pros: works with both OpenAI and Anthropic the same way.\nCons: requires you to use the dedicated wrapper SDKs.\nApproach 2: OpenTelemetry telemetry metadata (Vercel AI SDK)\nIf you're on the Vercel AI SDK, experimental_telemetry.metadata\nis the equivalent hook:\n\n``` js\nimport { openai } from '@ai-sdk/openai'\nimport { streamText } from 'ai'\n\nexport async function POST(req: Request) {\n  const result = streamText({\n    model: openai('gpt-4o-mini'),\n    prompt: (await req.json()).prompt,\n    experimental_telemetry: {\n      isEnabled: true,\n      metadata: {\n        userId: session.user.id,\n        plan: session.user.plan,\n      },\n    },\n  })\n  return result.toAIStreamResponse()\n}\n```\n\nThis lifts onto ai.telemetry.metadata.<key>\nspan attributes that any OpenTelemetry-compatible observability tool (Langfuse, Phoenix, Voight, Braintrust, Datadog) picks up.\nPros: zero coupling — pure OTel, swap exporters whenever.\nCons: only works if your SDK emits OTel spans. AI SDK does. Many others don't yet.\nApproach 3: Raw event emission (autonomous bots / non-HTTP)\nFor background workers, agents calling LLMs in loops, or anything that doesn't have a request boundary — emit events manually:\n\n``` js\nimport { Voight } from '@voightxyz/sdk'\n\nconst voight = new Voight({ agentId: 'my-bot' })\n\nconst t0 = Date.now()\nconst res = await fetch('https://api.openai.com/v1/chat/completions', {\n  method: 'POST',\n  headers: { authorization: `Bearer ${process.env.OPENAI_API_KEY}` },\n  body: JSON.stringify({\n    model: 'gpt-4o-mini',\n    messages: [...],\n  }),\n}).then((r) => r.json())\n\nvoight.log({\n  type: 'reasoning',\n  model: 'gpt-4o-mini',\n  durationMs: Date.now() - t0,\n  outcome: 'success',\n  metadata: {\n    tokens: {\n      input: res.usage.prompt_tokens,\n      output: res.usage.completion_tokens,\n    },\n    tags: {\n      userId: job.userId,\n      tenantId: job.tenantId,\n    },\n  },\n})\n```\n\nThis is more code per call, but you control everything. Useful when the LLM call doesn't fit cleanly inside a wrapper (e.g. you're proxying through your own router).\nPros: full control over what gets emitted.\nCons: more boilerplate. You're responsible for token counting.\nWhat you can answer once userId\nis in your tags\nOnce tags.userId\n(or whatever you name it) is on every event, the questions you can answer change shape:\nYou don't need a separate analytics SDK on the client. You don't need to copy userId\ninto LLM messages. You don't need anything custom on top — the tags propagate from the request boundary down to every span.\nA note on GDPR / multi-tenant safety\nuserId\nhere means your internal stable identifier — user_a3f9c2\nor whatever — not the user's email or wallet. Never put PII into telemetry metadata. The good observability tools scrub PII anyway, but garbage-in is still garbage.\nFor multi-tenant SaaS, add a second tag: tags: { userId, tenantId }\n. That way you can ask both \"which customer is this?\" and \"which of their users?\".\nWrapping up\nThree approaches, one mental model: stamp userId\nat the boundary, let it propagate to every LLM call inside the request.\nThe wrappers I used here are Apache 2.0:\n- @voightxyz/openai for OpenAI\n- @voightxyz/anthropic for Anthropic\n- @voightxyz/vercel-ai for the Vercel AI SDK\n- @voightxyz/sdk for library mode\nSame approach works with Langfuse, Phoenix, Braintrust, or your existing OTel pipeline — the metadata.userId pattern is the universal part.\nHow do you currently track per-user spend in your AI app? Stripe metering? Server logs? Or have you been flying blind?", "url": "https://wpnews.pro/news/per-user-cost-attribution-for-your-ai-app", "canonical_source": "https://dev.to/seenfinity/per-user-cost-attribution-for-your-ai-app-16o", "published_at": "2026-05-21 23:39:13+00:00", "updated_at": "2026-05-22 00:04:15.138108+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools", "products", "enterprise-software"], "entities": ["OpenAI", "Anthropic", "Vercel", "VoightXYZ", "GPT-4o-mini"], "alternates": {"html": "https://wpnews.pro/news/per-user-cost-attribution-for-your-ai-app", "markdown": "https://wpnews.pro/news/per-user-cost-attribution-for-your-ai-app.md", "text": "https://wpnews.pro/news/per-user-cost-attribution-for-your-ai-app.txt", "jsonld": "https://wpnews.pro/news/per-user-cost-attribution-for-your-ai-app.jsonld"}}