Lago Open-source SDK: Bill on top of your LLM token cost with no middleware

Lago released an open-source SDK that wraps existing LLM clients to automatically extract token usage data and send it to Lago's billing platform without requiring middleware or API changes. The SDK supports AWS Bedrock and Mistral providers with p99 overhead under 5 milliseconds, buffering usage events in memory and flushing them in batches while surviving provider or Lago outages through exponential backoff. The tool enables developers to bill customers based on LLM token consumption by attaching subscription IDs per call, per context, or as a default fallback.

Instrument LLM clients and emit usage events to Lago https://www.getlago.com for billing. ┌──────────────┐ your code ──────► │ wrapped client│ ──► provider Bedrock / Mistral / … └──────┬───────┘ │ extract usage ▼ ┌──────────────┐ │ Lago events │ ──► api.getlago.com └──────────────┘ - Wraps your existing LLM client in place — no API surface change for your application code. - Extracts usage from each response into a normalized shape CanonicalUsage . - Buffers events in memory, flushes them in batches to Lago's /events/batch endpoint. - Survives provider/Lago outages with exponential backoff and a bounded buffer. - p99 wrap-overhead under 5 ms — your call is never blocked on Lago. pip install lago-agent-sdk For Bedrock support: pip install 'lago-agent-sdk bedrock ' adds boto3 . For Mistral support: pip install 'lago-agent-sdk mistral ' adds mistralai . python import boto3 from lago agent sdk import LagoSDK sdk = LagoSDK api key="<YOUR LAGO API KEY ", api url="https://api.getlago.com/api/v1/", default subscription id="sub acme", client = sdk.wrap boto3.client "bedrock-runtime", region name="eu-west-1" resp = client.converse modelId="eu.amazon.nova-lite-v1:0", messages= {"role": "user", "content": {"text": "Hello"} } , sdk.flush The wrapped client behaves identically to the original — same arguments, same return shape, same exceptions. The SDK adds an in-memory queue that batches events to Lago in the background. python from mistralai.client import Mistral from lago agent sdk import LagoSDK sdk = LagoSDK api key="...", default subscription id="sub acme" client = sdk.wrap Mistral api key="..." resp = client.chat.complete model="mistral-small-latest", messages= {"role": "user", "content": "Hello"} , sdk.flush Three ways to set the external subscription id , in priority order: 1. Per-call override highest precedence client.converse ..., extra lago={"subscription": "sub acme", "dimensions": {"feature": "summarize"}} 2. Context-bound use in middleware to set once per request sdk.set subscription "sub acme" all calls in this thread/asyncio task → sub acme 3. Default at init fallback sdk = LagoSDK api key="...", default subscription id="sub default" Backed by contextvars for safe propagation across asyncio tasks. | Provider | Access | Status | |---|---|---| | AWS Bedrock | Converse sync + stream | ✓ | | AWS Bedrock | InvokeModel sync + stream , 7 model families | ✓ | | Mistral | native SDK chat.complete + chat.stream | ✓ | | OpenAI | native SDK | Phase 2 | | Anthropic | native SDK | Phase 2 | | Google Gemini | native SDK | Phase 2 | | LiteLLM | callback bridge | Phase 4 | CanonicalUsage carries 10 numeric fields. Which ones populate depends on the provider: | Field | Lago metric code | Bedrock | Mistral native | |---|---|---|---| | input | llm input tokens | ✓ | ✓ | | output | llm output tokens | ✓ | ✓ | | cache read | llm cached input tokens | ✓ Anthropic | ✓ when cache hits | | cache write | llm cache creation tokens | ✓ Anthropic | ✗ | | cache write 5m / 1h | llm cache write 5m/1h tokens | ✓ Anthropic InvokeModel | ✗ | | reasoning | llm reasoning tokens | ✗ folded into output | ✗ folded into output | | tool calls | llm tool calls | ✓ | ✓ | | image input / audio input | llm image/audio input tokens | ✗ | ✗ | Reasoning, image, and audio fields will populate when Phase 2 native OpenAI ships. The SDK never breaks your LLM call. If anything in instrumentation fails adapter bug, Lago down, network error , the SDK swallows it, logs a warning, and your call returns normally. Configurable via LagoConfig.on error callback to integrate with Sentry, Datadog, etc.: python from lago agent sdk import LagoConfig, LagoSDK def on error exc: Exception, where: str - None: sentry.capture exception exc, tags={"sdk phase": where} sdk = LagoSDK api key="...", config=LagoConfig api key="...", on error=on error , The SDK ships with default metric codes llm input tokens , llm output tokens , etc. . You need to register matching billable metrics in your Lago tenant before events count toward charges. See Lago docs — Billable Metrics https://docs.getlago.com/api-reference/billable-metrics/create . git clone https://github.com/getlago/lago-agent-sdk-python cd lago-agent-sdk-python python -m venv venv && source venv/bin/activate pip install -e '. dev ' pytest Run live integration tests requires real credentials : AWS BEARER TOKEN BEDROCK="..." \ MISTRAL API KEY="..." \ LAGO API URL="https://api.getlago.com/api/v1/" \ LAGO API KEY="..." \ LAGO EXTERNAL SUBSCRIPTION ID="sub ..." \ pytest tests/integration Found a vulnerability? See SECURITY.md /getlago/lago-agent-sdk-python/blob/main/SECURITY.md .