# My AI agent got dumber mid-session. I measured the context window before blaming MCP.

> Source: <https://dev.to/rapls/my-ai-agent-got-dumber-mid-session-i-measured-the-context-window-before-blaming-mcp-4c3l>
> Published: 2026-06-17 02:07:22+00:00

There's a particular way an AI coding agent goes bad. Not a crash, not an error. It just gets duller. Halfway through a long session it forgets a constraint you set early, repeats a question you already answered, or starts giving you shorter, vaguer replies to the same kind of ask it handled well an hour ago. You can feel the quality sag without anything actually breaking.

My first instinct was to blame MCP. I had a few servers connected, I'd read that connected servers eat the context window, so the story wrote itself: too many tools loaded, no room left to think, of course it's drifting. I was about to start disconnecting things. Then I decided to measure first, and the measurement didn't say what I expected.

The agent I use can print a breakdown of what's currently filling the context window, by category. So before cutting anything, I looked at where the tokens were actually going. I'll give this in proportions rather than raw numbers, because the absolute figures depend on the model and window size, and the shape is the part that transfers.

Roughly, in a session that had started drifting:

The thing I was about to blame was near the bottom of the list. The thing I hadn't thought about, the plain accumulation of conversation, was the top.

I want to be careful here, because "MCP doesn't cost anything" would be the wrong lesson, and it's not what I found.

MCP can be heavy. A connected server can load its full tool schema and carry it on every turn, and if your client loads all of that up front, a handful of servers really can take a large bite out of the window before you type a word. That version of the warning is real, and plenty of people have measured it on their own setups. So if you connect many servers and your client front-loads their schemas, the usual advice to disconnect what you don't use is sound.

What I'd add is narrower: it depends on how your client loads tools. Some setups defer the schema and only pull a tool's definition in when it's actually needed. In a setup like that, idle connected servers cost much less than the worst-case number suggests, and on the session I measured, they weren't my bottleneck. The general claim "MCP is expensive" and my specific result "MCP wasn't what filled my window" aren't in conflict. They're about different loading behavior. The honest takeaway isn't "MCP is innocent," it's "don't assume which line item is the problem, because it varies by setup."

The slice that grew without me noticing was conversation history. It makes sense once you see it: every exchange stays in the window, and a long exploratory session piles up turn after turn until the early context is competing for space with the part the model needs right now. Nothing dramatic added it. It was just the steady weight of a long conversation, and it was the part I hadn't thought to look at because it didn't feel like a "feature" I'd switched on.

That reframed the drift for me. The agent wasn't getting dumber because of what I'd connected. It was getting dumber because I'd been having one very long conversation, and the room to reason was slowly filling with the transcript of that conversation.

None of the fixes are clever. They're just the things that follow once you know history is the heavy part.

I don't let one exploratory session run forever. When a thread of work is basically done, I start fresh instead of carrying the whole transcript into the next, unrelated task. When I do need continuity, I have the agent summarize where things stand and carry the summary into a new session, rather than dragging the entire history across. The point is to move the gist, not the full back-and-forth, because the full back-and-forth is exactly the weight I measured.

The mental model that stuck: the context window is a desk, not a filing cabinet. Everything you want the model to use at once has to fit on the desk's surface, and a long conversation slowly covers it with paper until there's no room to work. Clearing the desk is sometimes better than buying a bigger one.

If I'd followed my first instinct, I'd have disconnected a few servers, freed up a small slice, watched the drift continue, and learned nothing. The fix would have missed the cause, and I'd have blamed the tool I'd primed myself to blame.

So the thing I'm keeping isn't "history is always the culprit," because on someone else's setup it really might be the connected servers, or the memory files, or something I'm not thinking of. The thing I'm keeping is the order of operations: when the agent starts drifting, read the breakdown before you cut anything. The line item you're sure is the problem and the line item that's actually the problem are often not the same, and the only way to tell them apart is to look.

When the agent gets dull mid-session, don't reach for the explanation you already have. Measure first. Read where the tokens are actually going, fix the slice that's actually large, and accept that it varies by setup so last time's culprit isn't a rule. For me it was conversation history, so I keep sessions shorter and hand off a summary instead of a transcript. Next time it might be something else, which is the whole reason to look instead of guess.

*I build WordPress plugins and write about AI tooling and security at https://raplsworks.com/.*