{"slug": "show-hn-tokentoll-a-ci-gate-for-llm-api-cost-regressions", "title": "Show HN: Tokentoll, a CI gate for LLM API cost regressions", "summary": "Tokentoll, a new CI gate for LLM API costs, statically analyzes Python, JavaScript, and TypeScript code in pull requests to detect cost regressions before deployment. The tool scores each PR against a configurable policy and posts a PASS/WARN/FAIL verdict directly on the pull request, with the option to fail the workflow when policy violations occur. This prevents expensive model swaps or excessive API calls from being merged into production code.", "body_md": "Prevent LLM cost regressions before production.\n\ntokentoll is a CI gate for LLM cost. It statically analyzes Python, JavaScript, and TypeScript for LLM API calls, scores every pull request against a policy you control, and posts a PASS/WARN/FAIL verdict directly on the PR. Optionally, it fails the workflow when the policy is violated, so cost regressions cannot be merged.\n\n[Jwrede/tokentoll-demo](https://github.com/Jwrede/tokentoll-demo) is a small polyglot LLM app (Python + TypeScript) wired up to the tokentoll cost gate. Two PRs are already open against it:\n\n[PR #1: Add Anthropic Haiku translation helper](https://github.com/Jwrede/tokentoll-demo/pull/1). New call site, well within budget. Verdict: PASS, workflow green.[PR #2: switch supportbot to gpt-4o](https://github.com/Jwrede/tokentoll-demo/pull/2). A model swap that trips two policy rules. Verdict: FAIL, workflow red.\n\nOpen each PR's conversation tab to see the verdict comment tokentoll actually posts.\n\nWhen a PR violates your policy, tokentoll comments with a verdict and a blocking-findings list, then exits non-zero so the check fails. Example:\n\n```\n## tokentoll verdict: FAIL\n\n**Blocking findings (2):**\n\n- `src/agent.py:42` - per-call cost grew 15.0x (threshold 5x)\n- total monthly delta +$812.00 exceeds budget $250.00\n\n> Required action: revert the regression, raise the threshold in `.tokentoll.yml`, or add an exemption.\n```\n\nWhen the PR is clean, the verdict is PASS and the comment shows only the cost delta table. When no policy is configured, tokentoll posts an informational delta comment with no verdict.\n\nAdd `.github/workflows/tokentoll.yml`\n\n:\n\n```\nname: tokentoll\non:\n  pull_request:\n    paths:\n      - \"**.py\"\n      - \"**.ts\"\n      - \"**.tsx\"\n      - \"**.js\"\n      - \"**.jsx\"\n\npermissions:\n  contents: read\n  pull-requests: write\n\njobs:\n  cost-gate:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n        with:\n          fetch-depth: 0\n      - uses: Jwrede/tokentoll@v0.7.0\n        with:\n          fail-on-policy-violation: true\n```\n\nThen add `.tokentoll.yml`\n\nto your repo root:\n\n```\nbudgets:\n  max_monthly_delta_usd: 250\n  max_callsite_monthly_usd: 100\n  max_relative_increase: 5.0\n\npolicies:\n  block_unknown_models: true\n  fail_on_policy_violation: true\n```\n\nFuture PRs receive a verdict comment. PRs that exceed the thresholds fail the workflow.\n\nFor SHA-pinned installs and minimal-permissions setups, see [docs/github-action.md](/Jwrede/tokentoll/blob/main/docs/github-action.md). For the full policy schema, see [docs/policy.md](/Jwrede/tokentoll/blob/main/docs/policy.md). For the security posture, see [docs/security.md](/Jwrede/tokentoll/blob/main/docs/security.md).\n\n**Python**\n\n| SDK | Patterns |\n|---|---|\n| OpenAI | `chat.completions.create` , `responses.create` |\n| Anthropic | `messages.create` , `messages.stream` |\n| Google GenAI | `models.generate_content` |\n| LiteLLM | `completion` , `acompletion` |\n| LangChain | `ChatOpenAI` , `ChatAnthropic` , `init_chat_model` |\n| Zhipu AI | `ZhipuAiClient` , `ZhipuAI` (GLM models) |\n\n**JavaScript / TypeScript** (parsed via tree-sitter, handles `.js`\n\n, `.jsx`\n\n, `.ts`\n\n, `.tsx`\n\n)\n\n| SDK | Patterns |\n|---|---|\n| OpenAI Node SDK | `client.chat.completions.create` , `client.responses.create` , `client.embeddings.create` |\n| Anthropic SDK | `client.messages.create` , `client.messages.stream` |\n| Vercel AI SDK | `generateText` , `streamText` , `generateObject` , `streamObject` , `embed` , `embedMany` |\n| LangChain.js | `new ChatOpenAI` , `new ChatAnthropic` , `new ChatGoogleGenerativeAI` , ... |\n| OpenAI-compatible | same shape as OpenAI Node SDK, picked up automatically |\n\nThe policy block in `.tokentoll.yml`\n\ncontrols when a PR fails:\n\n| Rule | Trigger |\n|---|---|\n`budgets.max_monthly_delta_usd` |\ntotal estimated monthly delta exceeds the threshold |\n`budgets.max_callsite_monthly_usd` |\nany new or changed call site exceeds the threshold |\n`budgets.max_relative_increase` |\nper-call cost for any modified call site grows by more than this multiplier |\n`policies.block_unknown_models` |\nany new or modified call site uses an unpriced or unresolved model |\n`policies.fail_on_policy_violation` |\n`tokentoll diff` exits 1 on FAIL (CI gate behavior) |\n\nEach rule is independent. Leave a field unset to disable that rule. Full reference in [docs/policy.md](/Jwrede/tokentoll/blob/main/docs/policy.md).\n\n```\npip install tokentoll\n\n# Scan current directory for LLM API calls and their costs\ntokentoll scan .\n\n# Show cost impact of your last commit\ntokentoll diff HEAD~1\n\n# Compare two refs and fail on policy violation\ntokentoll diff main..HEAD --fail-on-policy-violation\n```\n\nSubcommands:\n\n```\ntokentoll scan [PATH...] [--format table|json|markdown] [--calls-per-month N] [--config PATH]\ntokentoll diff [REF] [--base REF] [--head REF] [--format table|json|markdown|github-comment]\n               [--config PATH] [--fail-on-policy-violation]\ntokentoll update    # refresh bundled pricing data from LiteLLM\n```\n\n`.tokentoll.yml`\n\nlives in the repo root and is auto-discovered. Beyond the policy block:\n\n```\n# Per-SDK defaults for dynamic (runtime-resolved) model names\ndefault_models:\n  openai: gpt-4o-mini\n  anthropic: claude-haiku-3-20240307\n\n# Assumed monthly call volume per call site (used for dollar estimates)\ncalls_per_month: 5000\n\n# Skip cost estimation for dynamic models entirely.\n# Default false: dynamic calls are priced against the per-SDK default.\nskip_dynamic_models: false\n\n# Default excludes (tests/, examples/, docs/, cookbook/, benchmarks/, evals/,\n# scripts/, notebooks/) are applied automatically. Opt out with:\nuse_default_excludes: false\n\n# Additional excludes (prefix or glob)\nexclude:\n  - \"*_test.py\"\n  - vendor/\n\n# Per-path overrides (longest prefix match)\noverrides:\n  - path: src/agents/\n    default_model: gpt-4o\n    calls_per_month: 10000\n  - path: src/azure/\n    skip_dynamic_models: true\n```\n\nResolution order for dynamic model defaults: `default_models`\n\n(per-SDK) > `default_model`\n\n(generic) > built-in SDK defaults.\n\ntokentoll requires no API keys, sends no telemetry, and runs entirely inside your CI environment. Pricing data ships with the package and updates from LiteLLM on demand. For the recommended permission set, SHA pinning, and fork PR risk, see [docs/security.md](/Jwrede/tokentoll/blob/main/docs/security.md).\n\ntokentoll ships an MCP (Model Context Protocol) server so Claude Code and other MCP hosts can check the cost impact of LLM code changes from inside an agent conversation:\n\n```\npip install tokentoll[mcp]\nclaude mcp add --transport stdio tokentoll -- tokentoll-mcp\n```\n\nTwo tools are exposed: `scan`\n\n(estimate costs across a path) and `diff`\n\n(compare two refs). Both return JSON.\n\n```\n  Source code (.py, .ts, .tsx, .js, .jsx)\n        |\n        v\n  +----------------+   +------------------+\n  | AST scanners   |-->| SDK detectors    |\n  | ast (Python) + |   | OpenAI, Anthropic|\n  | tree-sitter    |   | Google, LiteLLM, |\n  | (JS/TS)        |   | LangChain, Zhipu,|\n  +----------------+   | Vercel AI SDK    |\n                       +------------------+\n                              |\n                              v\n                       +------------------+\n                       | Pricing engine   |\n                       | 2200+ models     |\n                       +------------------+\n                              |\n                              v\n                       +------------------+\n                       | Diff engine      |\n                       | (old vs new)     |\n                       +------------------+\n                              |\n                              v\n                       +------------------+\n                       | Policy evaluator |\n                       | PASS/WARN/FAIL   |\n                       +------------------+\n                              |\n                              v\n                       +------------------+\n                       | PR comment / CLI |\n                       | output           |\n                       +------------------+\n```\n\nA multi-pass constant propagation engine resolves model names through variable assignments, `os.getenv()`\n\n/ `process.env.X`\n\nfallbacks, function defaults, class attributes, constructor arguments, dict and object literals, `**kwargs`\n\nunpacking, and Vercel AI SDK provider wrappers (`openai(\"gpt-4o\")`\n\n), so real-world code with indirection still produces useful estimates.\n\nPricing is bundled and works offline. To refresh from LiteLLM:\n\n```\ntokentoll update\n```\n\nCoverage: 300+ models across OpenAI, Anthropic, Google, AWS Bedrock, Azure, and more, plus 2200+ entries from LiteLLM's combined catalog.\n\n- Static analysis only. Models loaded from databases or remote config cannot be resolved; tokentoll falls back to the configured per-SDK default and marks the call site as\n`(default)`\n\n. - Token estimates use a characters/4 heuristic unless\n[tiktoken](https://github.com/openai/tiktoken)is installed (`pip install tokentoll[tiktoken]`\n\n). - Monthly estimates assume uniform call volume per call site. Override per-project with\n`calls_per_month`\n\nor per-path with`overrides`\n\n. - JS/TS resolution is same-file only. Importing a model name from another module produces a dynamic call site rather than a resolved value.\n\n**v0.9**: Public demo repo with a known-failing PR, gpt-researcher case study, expanded adoption section** Future**: Context-aware call frequency inference (FastAPI routes versus scripts versus loops); cross-file import resolution for JS/TS\n\nMIT", "url": "https://wpnews.pro/news/show-hn-tokentoll-a-ci-gate-for-llm-api-cost-regressions", "canonical_source": "https://github.com/Jwrede/tokentoll", "published_at": "2026-05-30 12:41:53+00:00", "updated_at": "2026-05-30 13:16:30.688942+00:00", "lang": "en", "topics": ["ai-tools", "mlops", "large-language-models", "ai-infrastructure", "ai-products"], "entities": ["Tokentoll", "Anthropic", "Haiku", "GPT-4o", "Jwrede"], "alternates": {"html": "https://wpnews.pro/news/show-hn-tokentoll-a-ci-gate-for-llm-api-cost-regressions", "markdown": "https://wpnews.pro/news/show-hn-tokentoll-a-ci-gate-for-llm-api-cost-regressions.md", "text": "https://wpnews.pro/news/show-hn-tokentoll-a-ci-gate-for-llm-api-cost-regressions.txt", "jsonld": "https://wpnews.pro/news/show-hn-tokentoll-a-ci-gate-for-llm-api-cost-regressions.jsonld"}}