Debugging AI Coding Agents: How to See Prompts, Tool Calls, Token Usage, and Cost

A developer built ccglass, an open-source local proxy and dashboard that lets engineers inspect the full request flow of AI coding agents, including system prompts, tool call schemas, tool results, per-request token usage, and cost. The tool addresses the difficulty of debugging agentic coding tools by revealing what is actually sent to the model, helping identify issues like context bloat, malformed tool calls, and provider errors.

When a coding agent fails, the visible error is rarely the whole story. You might see: 400 Bad Request The usual reaction is to tweak the prompt and try again. Sometimes that works. But for agentic coding tools, guessing is not enough. You need to inspect what the agent actually sent to the model. That is the problem ccglass is built for. GitHub: https://github.com/jianshuo/ccglass https://github.com/jianshuo/ccglass Modern coding agents are not simple chatbots. Tools like Claude Code, Codex, OpenCode, CodeBuddy, Qoder, and similar systems usually run a loop like this: php user request - model request - tool call - local command / file read / edit / search - tool result - next model request - final answer When something goes wrong, the bug can be in any part of that loop. For example: You cannot debug that reliably from the final answer alone. When an agent behaves strangely, I usually want to see five things. The system prompt often explains behavior that looks mysterious from the outside. It may contain rules about: If the agent ignores your instruction, first check whether the system prompt is pushing it in a different direction. Tool calling depends heavily on the schema sent to the model. If a tool is described vaguely, has confusing parameter names, or contains a schema shape the provider does not like, the model may choose the wrong tool or produce invalid arguments. This matters even more with MCP servers and custom tools. The question is not "what did my code define?" The real question is: What tool schema was actually sent in the model request? A tool call bug can come from the model, the client, or the provider adapter. You want to inspect: For example, if the model emits something that looks like a tool call but the client renders it as text, the agent may continue as if the tool ran even though no tool result exists. Tool results are often the hidden source of context bloat. A single file read, search result, stack trace, or command output can add thousands of tokens to the next turn. If the agent suddenly becomes expensive or confused, check what tool results were fed back into the model. Token totals are useful, but per-request token usage is better. You want to know: That is the difference between "this session was expensive" and "this specific tool result caused the spike." ccglass is a local proxy and dashboard for coding-agent traffic. It lets you inspect what supported agents actually send to the model: It works locally. It is open source. Install: npm install -g ccglass Start it: ccglass Or choose a client directly: ccglass claude ccglass codex ccglass opencode ccglass qoder ccglass codebuddy For generic OpenAI-compatible or Anthropic-compatible clients, you can also run proxy-only mode: ccglass proxy --provider openai ccglass proxy --provider claude Then point your client or IDE at the printed local base URL. Suppose an agent repeatedly fails to call a tool correctly. Instead of changing the prompt first, inspect the actual request flow: That gives you a factual answer to questions like: Another common problem: Why did this one coding-agent session use so many tokens? In ccglass, inspect the request list and session summary. Look for: Then use turn-to-turn diff to see what changed between two requests. This is often more useful than looking only at the final cost. Provider errors are another good use case. If an Anthropic-compatible or OpenAI-compatible endpoint rejects a request, you need the exact payload. Check: This is useful when working with: The failure is often not "the model is bad." It is often a request-shape problem. ccglass can export captured requests: ccglass export <session /<seq --format raw ccglass export <session /<seq --format md ccglass export <session /<seq --format json ccglass export <session /<seq --format har That is useful when reporting bugs to an agent project, provider, or proxy maintainer. Instead of saying: The agent failed. You can show: This exact request contained this tool schema, this model response emitted this malformed tool call, and this provider returned this error. That is much easier to debug. ccglass is not a universal network sniffer. It works best when the client can be pointed at a local base URL or local proxy. For example, API-key based OpenAI-compatible and Anthropic-compatible traffic is a good fit. Some clients have special transports. For example, Codex authenticated through ChatGPT login may use a WebSocket path that does not honor OPENAI BASE URL , so local base URL inspection will not see that traffic. For CodeBuddy, ccglass uses a forward-proxy mode because CodeBuddy hardcodes its upstream endpoint. As coding agents become more autonomous, debugging needs to move one layer deeper. It is no longer enough to ask: Did the agent produce the right diff? You also need to ask: What did the agent see, what tool did it choose, what result came back, and what context entered the next turn? That is what ccglass tries to make visible. GitHub: https://github.com/jianshuo/ccglass https://github.com/jianshuo/ccglass Install: npm install -g ccglass If you build with coding agents, request-level debugging is worth having in your toolbox.