How LLMs Now Monitor and Cut Their Own Token Spend

Skillware v0.4.0 introduces a new token limiter skill that allows LLM agents to monitor and cut their own token spend. The skill acts as a budget gate, returning actions like CONTINUE, WARN, or FORCE_TERMINATE based on cumulative token usage against a set ceiling. It is provider-neutral and requires the orchestrator to act on its decisions.

You have seen this loop before. An agent starts a “simple” task, say scrape listings, refactor a repo, research a market, or whatever. It fails, it retries, it re-reads context, it apologizes and tries all over again. Twenty minutes in and the dashboard shows six figures of tokens and zero useful outputs or deliverables. The model did not misbehave on purpose. The orchestrator never had a hard budget gate with an ROI in mind. Skillware v0.4.0 ships a new skill for exactly that gap: monitoring/token limiter https://github.com/ARPAHLS/skillware/tree/main/skills/monitoring/token limiter . It lets you Skillware https://github.com/ARPAHLS/skillware is an open registry of installable agent capabilities . Each skill is a bundle: skill.py execute returns JSON instructions.md manifest.yaml You load by ID, adapt for your provider, call execute on tool use. The model decides when , the skill decides how , predictably, every time. That split matters for budget control. You do not want the LLM guessing whether it is “allowed” to spend more tokens. You want a small, auditable function that answers: continue, warn, or stop. Token Limiter This skill is a budget gate , not a kill switch wired into OpenAI or Anthropic. After each model turn, your host loop passes cumulative usage. The skill returns one of three actions: | Action | Meaning | |---|---| CONTINUE | Under the soft threshold — keep going | WARN | Approaching the limit default 80% — tighten scope | FORCE TERMINATE | Hard ceiling hit — stop the loop | Important nuance: the skill does not cancel API sessions or kill processes. It returns a structured decision. Your orchestrator must act on it. That is by design — Skillware skills stay portable and provider-neutral. No skill-specific API keys. No network calls. Pure Python math on numbers you supply. Picture a scrape task with a 100,000 token ceiling. token limiter WARN FORCE TERMINATE → host breaks the loop and surfaces the reasonMinimal integration: python from skillware.core.loader import SkillLoader bundle = SkillLoader.load skill "monitoring/token limiter" skill = bundle "module" .TokenLimiterSkill result = skill.execute { "task id": "scrape listings 101", "current token count": 125 000, "max allowed tokens": 100 000, "model id": "gpt-4o", } if result "action" == "FORCE TERMINATE": raise RuntimeError result "reason" The host tracks cumulative current token count from whatever provider you use — usage metadata from the API, a local tokenizer, or your own accounting layer. The skill does not read billing dashboards for you. Optional model id maps to bundled list prices for indicative USD in the response. Handy for ops dashboards; not invoice-grade. Unknown models fall back to a blended rate with a warning in the payload. Optional turn id makes retries idempotent: same turn, same counts, same decision — no double-penalty if your loop replays a step. The skill lives under a new monitoring/ category — room for more observability skills later. budget.py skill.py BaseSkill wrapper, in-memory turn cache instructions.md FORCE TERMINATE data/model pricing.json v1 enforces token limits only . ROI fields expected outcome , outcome delivered , roi value usd are accepted as scaffold for v2 — outcome-aware gates later, without breaking the v1 contract today. Runnable examples ship in the repo: local loop simulation token limiter loop.py , plus Gemini and Claude harnesses. Install and try: pip install skillware Catalog page: docs/skills/token limiter.md https://github.com/ARPAHLS/skillware/blob/main/docs/skills/token limiter.md Budget control pairs naturally with optimization/prompt rewriter — compress bloated context Running agents against contracts or wallets? Screen first with finance/wallet screening , execute with defi/evm tx handler token limiter Autonomous agents without token guardrails are expensive experiments. monitoring/token limiter gives you a deterministic, testable answer to a simple question after every turn: It ships in Skillware v0.4.0 today. Load it once, wire it into your loop, and stop paying for agents that retry themselves into oblivion. Links monitoring/token limiter sourceQuestions, issues, or skill ideas welcome in the repo. If you are building agent infra, start with a budget gate — your finance team will thank you later.