Tracking token usage across OpenAI, Anthropic, and Gemini: every streaming gotcha I hit A developer building Spanlens, an open-source LLM observability tool, found that OpenAI, Anthropic, and Gemini report token usage differently during streaming, requiring distinct parsing logic for each provider. OpenAI places usage in a final chunk only if requested, Anthropic splits input and output tokens across two events, and Gemini has two stream formats. Additionally, prompt caching accounting differs: OpenAI includes cached tokens in prompt_tokens, while Anthropic reports them separately. OpenAI, Anthropic, and Gemini each report token usage differently, and it stops being trivia the moment you track LLM cost. I build Spanlens, an open-source LLM observability tool that sits in front of all three as a proxy and records every call with its model, latency, tokens, and cost. To do the cost part I read the token usage back out of every response, including the streaming ones. I assumed the three providers would report usage in roughly the same way. They send the same kind of data, after all: input tokens, output tokens, maybe a cached count. How different could it be. Pretty different, it turns out. Here is the whole thing in one table, then each gotcha in detail with the real parser code from the repo. | Provider | Where usage lives streaming | Cache accounting | Field names | |---|---|---|---| | OpenAI | final chunk, needs stream options: { include usage: true } | prompt tokens includes cache | prompt tokens / completion tokens | | Anthropic | split across message start + message delta | input tokens excludes cache, so add it | input tokens / output tokens | | Gemini | usageMetadata , two stream formats | not applicable | promptTokenCount / candidatesTokenCount | For a non-streaming call this is boring. Every provider hands you a usage object on the response body and you read it. Streaming is where it gets weird, because the token counts are not in the content chunks. They show up somewhere else, and "somewhere else" is different for each provider. OpenAI puts the usage in a final chunk, after all the content, right before DONE . You only get it if you ask for it with stream options: { include usage: true } . Miss that flag and you stream the whole response and end up with no usage at all. export function parseOpenAIStreamChunk line: string : Partial