cd /news/large-language-models/lago-open-source-sdk-bill-on-top-of-… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-14738] src=github.com pub= topic=large-language-models verified=true sentiment=↑ positive

Lago Open-source SDK: Bill on top of your LLM token cost with no middleware

Lago released an open-source SDK that wraps existing LLM clients to automatically extract token usage data and send it to Lago's billing platform without requiring middleware or API changes. The SDK supports AWS Bedrock and Mistral providers with p99 overhead under 5 milliseconds, buffering usage events in memory and flushing them in batches while surviving provider or Lago outages through exponential backoff. The tool enables developers to bill customers based on LLM token consumption by attaching subscription IDs per call, per context, or as a default fallback.

read3 min publishedMay 27, 2026

Instrument LLM clients and emit usage events to Lago for billing.

                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
your code ──────► β”‚ wrapped clientβ”‚ ──► provider (Bedrock / Mistral / …)
                  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ (extract usage)
                         β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚  Lago events β”‚ ──► api.getlago.com
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • Wraps your existing LLM client in place β€” no API surface change for your application code.
  • Extracts usage from each response into a normalized shape ( CanonicalUsage

). - Buffers events in memory, flushes them in batches to Lago's /events/batch

endpoint. - Survives provider/Lago outages with exponential backoff and a bounded buffer.

  • p99 wrap-overhead under 5 ms β€” your call is never blocked on Lago.
pip install lago-agent-sdk

For Bedrock support: pip install 'lago-agent-sdk[bedrock]'

(adds boto3

). For Mistral support: pip install 'lago-agent-sdk[mistral]'

(adds mistralai

).

import boto3
from lago_agent_sdk import LagoSDK

sdk = LagoSDK(
    api_key="<YOUR_LAGO_API_KEY>",
    api_url="https://api.getlago.com/api/v1/",
    default_subscription_id="sub_acme",
)
client = sdk.wrap(boto3.client("bedrock-runtime", region_name="eu-west-1"))

resp = client.converse(
    modelId="eu.amazon.nova-lite-v1:0",
    messages=[{"role": "user", "content": [{"text": "Hello"}]}],
)
sdk.flush()

The wrapped client behaves identically to the original β€” same arguments, same return shape, same exceptions. The SDK adds an in-memory queue that batches events to Lago in the background.

from mistralai.client import Mistral
from lago_agent_sdk import LagoSDK

sdk = LagoSDK(api_key="...", default_subscription_id="sub_acme")
client = sdk.wrap(Mistral(api_key="..."))

resp = client.chat.complete(
    model="mistral-small-latest",
    messages=[{"role": "user", "content": "Hello"}],
)
sdk.flush()

Three ways to set the external_subscription_id

, in priority order:

client.converse(..., extra_lago={"subscription": "sub_acme", "dimensions": {"feature": "summarize"}})

sdk.set_subscription("sub_acme")

sdk = LagoSDK(api_key="...", default_subscription_id="sub_default")

Backed by contextvars

for safe propagation across asyncio

tasks.

Provider Access Status
AWS Bedrock Converse (sync + stream)
βœ“
AWS Bedrock InvokeModel (sync + stream), 7 model families
βœ“
Mistral native SDK (chat.complete + chat.stream )
βœ“
OpenAI native SDK Phase 2
Anthropic native SDK Phase 2
Google Gemini native SDK Phase 2
LiteLLM callback bridge Phase 4

CanonicalUsage

carries 10 numeric fields. Which ones populate depends on the provider:

Field Lago metric code Bedrock Mistral native
input llm_input_tokens
βœ“ βœ“
output llm_output_tokens
βœ“ βœ“
cache_read llm_cached_input_tokens
βœ“ (Anthropic) βœ“ (when cache hits)
cache_write llm_cache_creation_tokens
βœ“ (Anthropic) βœ—
cache_write_5m / 1h llm_cache_write_5m/1h_tokens
βœ“ (Anthropic InvokeModel) βœ—
reasoning llm_reasoning_tokens
βœ— (folded into output) βœ— (folded into output)
tool_calls llm_tool_calls
βœ“ βœ“
image_input / audio_input llm_image/audio_input_tokens
βœ— βœ—

Reasoning, image, and audio fields will populate when Phase 2 native OpenAI ships.

The SDK never breaks your LLM call. If anything in instrumentation fails (adapter bug, Lago down, network error), the SDK swallows it, logs a warning, and your call returns normally.

Configurable via LagoConfig.on_error

callback to integrate with Sentry, Datadog, etc.:

from lago_agent_sdk import LagoConfig, LagoSDK

def on_error(exc: Exception, where: str) -> None:
    sentry.capture_exception(exc, tags={"sdk_phase": where})

sdk = LagoSDK(
    api_key="...",
    config=LagoConfig(api_key="...", on_error=on_error),
)

The SDK ships with default metric codes (llm_input_tokens

, llm_output_tokens

, etc.). You need to register matching billable metrics in your Lago tenant before events count toward charges. See Lago docs β€” Billable Metrics.

git clone https://github.com/getlago/lago-agent-sdk-python
cd lago-agent-sdk-python
python -m venv venv && source venv/bin/activate
pip install -e '.[dev]'
pytest

Run live integration tests (requires real credentials):

AWS_BEARER_TOKEN_BEDROCK="..." \
MISTRAL_API_KEY="..." \
LAGO_API_URL="https://api.getlago.com/api/v1/" \
LAGO_API_KEY="..." \
LAGO_EXTERNAL_SUBSCRIPTION_ID="sub_..." \
pytest tests/integration

Found a vulnerability? See SECURITY.md.

── more in #large-language-models 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/lago-open-source-sdk…] indexed:0 read:3min 2026-05-27 Β· β€”