{"slug": "understand-and-reduce-token-usage-with-contextspy-context-profiler", "title": "Understand and reduce token usage with ContextSpy context profiler", "summary": "ContextSpy, a context window profiler for large language models and AI coding agents, launched to help developers visualize and reduce token usage. The tool intercepts LLM API requests to analyze prompt composition and track context changes, addressing rising costs from input tokens that can outnumber output tokens by 20-50x in agentic workloads. It supports agents like GitHub Copilot, Claude Code, and opencode, offering a dashboard for live breakdowns of token categories.", "body_md": "[ Quick start](#quick-start) |\n\n[|](#why-should-i-care)\n\n**Motivation**[|](/RimantasZ/contextspy/blob/main/docs/changelog.md)\n\n**What's new**[|](/RimantasZ/contextspy/blob/main/docs/install.md)\n\n**Install guide**[|](/RimantasZ/contextspy/blob/main/docs/cloud-mode.md)\n\n**Coding agent setup**[|](/RimantasZ/contextspy/blob/main/docs/faq.md)\n\n**FAQ**\n\n**Supported agents** ContextSpy is a context window profiler for large language models and common agentic AI coding tools. It is used to intercept requests to an LLM API, analyze and visualize prompt composition, and track context changes between multiple requests in the same session. Modern AI coding agents (GitHub Copilot, Claude Code, opencode, etc.) pack a lot into each LLM request: system prompts, tool definitions and results, file contents, conversation history. It's often unclear why a session is slow, expensive, or hitting the context limit. ContextSpy makes the invisible visible - you see a live breakdown of every token category for every request, across sessions, over time.\n\nThink of your favorite CPU or memory profiler, just applied to the contents of the context of an AI agent. While you can optimize performance just by reviewing code, having a profiler to capture and visualise snapshot data helps a lot. Same with LLM context optimisation.\n\nQuick setup for macOS (Apple Silicon) — see [install guide](/RimantasZ/contextspy/blob/main/docs/install.md) for Linux, Windows, and PyPI options:\n\n```\n# install latest binary release with Homebrew\nbrew tap RimantasZ/contextspy\nbrew install contextspy\n\n# Install CA certificate into system trust store (one-time, cloud mode only)\nsudo contextspy install-cert\n\n# Start the proxy (keep this terminal open)\ncontextspy start\n\n# In a new terminal: launch your coding agent through the proxy\n# contextspy run sets required environment variables, so LLM requests are routed through the proxy\ncontextspy run claude <path to your project>\n# contextspy run opencode <path to your project>\n# contextspy run code <path to your project>\n```\n\nOpen [http://127.0.0.1:5173](http://127.0.0.1:5173) in your browser for the ContextSpy dashboard.\n\nIf something doesn't work, see the [troubleshooting section](/RimantasZ/contextspy/blob/main/docs/install.md#troubleshooting) in the install guide.\n\nAlternatively, refer to [configure your agent](/RimantasZ/contextspy/blob/main/docs/cloud-mode.md) on how to route LLM traffic through the proxy at `http://127.0.0.1:8888`\n\n**Token costs are rising.** With AI agents embracing more and more complex workflows and use cases, token consumption and subsequent\ncloud API bills are growing larger and larger. This is also applicable for AI coding agents and tools,\nwhere providers are gradually switching from subsidized subscription mode and are either reducing token\nlimits or switching to token usage based billing (e.g. GitHub Copilot).\n\n**Input tokens are major part**. When discussing AI model pricing, most people bring up token generation cost - that's where the numbers look\nmost dramatic ($25 per million tokens for Opus 4.8 output vs $0.40 for gpt-5-nano). But in agentic\nworkloads, input tokens outnumber output by 20-50x, or even more. So most of your API bill is influenced\nby input context, not the output the model generates.\n\n**AI coding agents = lots of input**. The expensive part is the quick accumulation of context - with every turn it fills up with additional tokens - system prompt, skills, tool definitions, tool results, file contents, conversation history.\nYou start with 5000 - 10000 tokens in a fresh session, but by turn 25 it might be 30 to 50 thousand, spend some more time and it might be hitting the context window limit and compacting. Every API call to the model sends the full context\nas part of the prompt - and here is where the token consumption and costs skyrocket quickly.\n\nWe all have been told that the more information we will give to the model, the more capable it will be. And there are models with 1M token (or even bigger) context windows.\n\nThere are three ways you pay for extra (and sometimes unnecessary) information in your context:\n\n**API Costs**- even with near perfect cache hits, input token costs outweigh output, often by order of magnitude or more.** Compute and latency**- larger contexts take considerably longer to process - especially in locally hosted models** Context rot**- with larger contexts, LLMs start to lose precision rapidly, with[100k being the limit](https://www.trychroma.com/research/context-rot)where rapid degradation starts. So you are paying for more expensive model, but getting performance of cheaper one - or even worse.\n\nContextSpy makes these costs visible so you can act on them.\n\nContextSpy starts an HTTPS proxy (or reverse proxy for locally hosted models) which intercepts every request to LLMs, analyzes it and stores to local SQLite db. A webserver is also started on localhost, and serves dashboard to visualise all captured data.\n\nNo, it does not send any data to the cloud. All data is stored locally on your machine.\n\nBut users must be aware, that it will be running proxy, and capturing all traffic from agent to LLM provider - and storing it locally to be displayed and analysed in the UI. The proxy and dashboard server are bound to localhost, and not exposed to external access, but still could be accessed locally.\n\nThe intended use case is to run ContextSpy as a profiler tool on dedicated profiling and optimisation sessions, rather than keeping it permanently as a monitoring tool.\n\nThe contents of requests are purged from the database after 7 days, and only statistics are retained.\n\nThe contents of database can be cleared manually by running `contextspy reset-db`\n\n.\nIn practice, it is recommended to do it from time to time.\n\nThe new version can be installed with homebrew:\n\n```\n## optional - sometimes brew \"forgets\" custom tap, add it again if just update fails\nbrew tap RimantasZ/contextspy \n## update homebrew and upgrade contextspy\nbrew update\nbrew upgrade contextspy\n```\n\nAt this stage, the database schema is subject to change, so it is advisable to purge db before upgrading.\n\n| Layer | Technology |\n|---|---|\n| Backend | Python 3.11+,\n|\n\n[React](https://react.dev/)+[Vite](https://vitejs.dev/),[TanStack Query](https://tanstack.com/query),[Recharts](https://recharts.org/),[Tailwind CSS](https://tailwindcss.com/)[Typer](https://typer.tiangolo.com/)[mitmproxy](https://mitmproxy.org/)— TLS-terminating forward proxy (cloud) and reverse proxy (local)[SQLAlchemy](https://www.sqlalchemy.org/)— all data local in`~/.contextspy/`\n\n[tiktoken](https://github.com/openai/tiktoken)(`cl100k_base`\n\n) for token estimation[uv](https://github.com/astral-sh/uv), Homebrew tap,`.deb`\n\n, standalone binary**Two proxy modes**— forward proxy for cloud APIs (OpenAI, Anthropic, Copilot), reverse proxy for local LLM servers (Ollama, llama.cpp, vLLM)**Context breakdown**— input tokens split into 8 categories: system prompt, tool definitions, tool results, file contents, conversation history, current user message, assistant prefill, uncategorised**Live dashboard**— real-time charts and per-request detail with a visual block map of the context window** Session tracking**— name and group requests by task to compare usage across runs** SQLite storage**— all data stored locally in`~/.contextspy/`\n\n; no data leaves your machine**Agent detection**— Copilot, Claude Desktop/Code, opencode, Cursor, and generic clients\n\n[Installation](/RimantasZ/contextspy/blob/main/docs/install.md)— PyPI, Homebrew, .deb, binary, CA certificate setup[Cloud API mode](/RimantasZ/contextspy/blob/main/docs/cloud-mode.md)— intercept OpenAI, Anthropic, Copilot, etc.[Local LLM mode](/RimantasZ/contextspy/blob/main/docs/local-mode.md)— intercept Ollama, llama-server, vLLM[Usage examples](/RimantasZ/contextspy/blob/main/docs/examples.md)— practical recipes and common workflows[CLI reference](/RimantasZ/contextspy/blob/main/docs/cli.md)— all commands and options[Development](/RimantasZ/contextspy/blob/main/docs/development.md)— architecture, data storage, contributing", "url": "https://wpnews.pro/news/understand-and-reduce-token-usage-with-contextspy-context-profiler", "canonical_source": "https://github.com/RimantasZ/contextspy", "published_at": "2026-06-15 19:29:06+00:00", "updated_at": "2026-06-15 19:34:56.654334+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "developer-tools", "ai-agents"], "entities": ["ContextSpy", "GitHub Copilot", "Claude Code", "opencode", "Homebrew", "Apple Silicon", "Opus", "gpt-5-nano"], "alternates": {"html": "https://wpnews.pro/news/understand-and-reduce-token-usage-with-contextspy-context-profiler", "markdown": "https://wpnews.pro/news/understand-and-reduce-token-usage-with-contextspy-context-profiler.md", "text": "https://wpnews.pro/news/understand-and-reduce-token-usage-with-contextspy-context-profiler.txt", "jsonld": "https://wpnews.pro/news/understand-and-reduce-token-usage-with-contextspy-context-profiler.jsonld"}}