cd /news/ai-agents/my-agent-kept-hitting-context-limits… · home topics ai-agents article
[ARTICLE · art-13866] src=dev.to pub= topic=ai-agents verified=true sentiment=↑ positive

My agent kept hitting context limits. This one function fixed it.

A developer created `agent-message-trim`, a Python library that solves context window limits in AI agents by intelligently trimming conversation history while preserving paired tool_use and tool_result messages. The function ensures trimmed message histories remain valid for Anthropic's API, preventing request rejections that occur when tool calls are separated from their results. The library supports multiple trimming strategies, including dropping oldest messages or removing from the middle of conversations.

read2 min publishedMay 25, 2026

This is a submission for the Hermes Agent Challenge.

My Hermes research agent was failing after about 40 turns. The cause: conversation history growing past the context window. The fix everyone reaches for is "just drop old messages" — but if you drop a tool_use

without its matching tool_result

, Anthropic's API rejects the whole request.

I needed something smarter. That's agent-message-trim

.

from agent_message_trim import trim_messages

result = trim_messages(messages, max_tokens=4000)

response = client.messages.create(
    model="claude-sonnet-4-5",
    messages=result.messages,
    ...
)

print(f"Dropped {result.dropped_count} messages to fit")

This is the part that matters. If your history looks like this:

[
    {"role": "user", "content": "search for X"},
    {"role": "assistant", "content": [{"type": "tool_use", "id": "call_001", ...}]},
    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "call_001", ...}]},
    {"role": "assistant", "content": "Here is what I found."},
]

trim_messages

never drops the tool_use

without also dropping its tool_result

. They move as a unit. The conversation you get back is always API-valid.

result = trim_messages(messages, max_tokens=4000, keep_system=True)
result = trim_messages(messages, max_tokens=4000, strategy="drop_oldest")

result = trim_messages(messages, max_tokens=4000, strategy="drop_middle")

drop_middle

is useful when you want to keep the original task context AND the most recent exchange, but can sacrifice the middle of a long conversation.

The built-in estimator is max(1, (len(text)+3)//4)

. Plug in your own:

import tiktoken
enc = tiktoken.encoding_for_model("gpt-4o")

result = trim_messages(
    messages,
    max_tokens=4000,
    count_tokens=lambda text: len(enc.encode(text)),
)
result = trim_messages(messages, max_tokens=4000)
result.messages        # trimmed list
result.token_count     # estimated tokens used
result.original_count  # how many messages came in
result.dropped_count   # how many were removed
result.ok              # True if nothing was dropped
result.kept_count      # len(result.messages)
python
from agent_message_trim import trim_to_fit

trimmed = trim_to_fit(messages, max_tokens=4000)

Standard library only: json

, dataclasses

. Nothing else.

pip install agent-message-trim
── more in #ai-agents 4 stories · sorted by recency
── more on @anthropic 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/my-agent-kept-hittin…] indexed:0 read:2min 2026-05-25 ·