cd /news/artificial-intelligence/i-almost-gave-up-on-my-ai-assistant-… · home topics artificial-intelligence article
[ARTICLE · art-26075] src=dev.to pub= topic=artificial-intelligence verified=true sentiment=· neutral

I almost gave up on my AI assistant — here’s how I fixed context handling

A developer built a personal AI assistant but struggled with context handling as conversations grew longer. They implemented a hierarchical context management system that keeps recent messages raw and periodically summarizes older history, solving token limits and memory issues.

read4 min publishedJun 13, 2026

I’ve been building a personal AI assistant for the past few months. You know the kind: you chat with it, it remembers what you said, and it helps with tasks like summarizing emails, answering questions about your notes, or just being a sounding board.

It started as a weekend project. A few Python scripts, an OpenAI-compatible API endpoint, and a simple loop in the terminal. I was smug. "Look, I built an AI!" But then things got ugly.

The moment I started having longer conversations, the bot became useless. It would forget what I said three messages ago, contradict itself, or start repeating the same advice. I was throwing more and more tokens at the API, and my wallet was crying. Something had to change.

My first attempt was trivial: just append every new message to a list and send the whole history as the messages

array to the API. That worked… for about 10 exchanges. Then token limits kicked in. The API started truncating the oldest messages, breaking the conversation flow.

I tried a sliding window approach—keep only the last N messages. Better, but the assistant lost the long-term context. If I asked it to "remind me of that book I mentioned yesterday," it had no idea. I was essentially lobotomizing my bot every few turns.

Another dead end was summarizing earlier parts of the conversation on every turn. That worked technically, but it added latency and cost. Each turn, I had to re-summarize the entire history. Not sustainable.

I needed a system that could:

This turned out to be a well-known pattern in conversational AI: hierarchical context management. I just didn't know the name then.

Here’s the high-level design:

[Messages]
  ├─ Recent (last 5-10 messages) → passed raw to the API
  └─ Older history → periodically summarized into a static summary string

The key insight is that you don’t need to summarize after every message. You only need to rotate the summary when the conversation has grown enough to push out important content. For my use case, I set a threshold: once the recent window exceeds 6 messages AND the oldest message in that window is older than X minutes, I trigger a summarization.

Here’s the Python class that implements this:

import time
from typing import List, Dict, Optional

class ContextManager:
    def __init__(self, max_recent: int = 6, summary: str = ""):
        self.max_recent = max_recent
        self.summary = summary
        self.recent_messages: List[Dict] = []
        self.last_summary_time = time.time()

    def add_message(self, role: str, content: str):
        self.recent_messages.append({"role": role, "content": content})
        if len(self.recent_messages) > self.max_recent:
            self._maybe_summarize()

    def _maybe_summarize(self):
        if time.time() - self.last_summary_time < 60:
            return
        older = self.recent_messages[:-self.max_recent + 2]  # keep last 2 raw
        if older:
            new_summary = self._summarize_messages(older)
            self.summary = new_summary if new_summary else self.summary
            self.recent_messages = self.recent_messages[-self.max_recent + 2:]
            self.last_summary_time = time.time()

    def _summarize_messages(self, msgs: List[Dict]) -> str:
        text = "\n".join(m["content"] for m in msgs)
        return text[:500] if len(text) > 500 else text

    def build_context(self, system_prompt: str) -> List[Dict]:
        system = {"role": "system", "content": f"{system_prompt}\nSummary of earlier conversation: {self.summary}"}
        return [system] + self.recent_messages

This class builds the context array that you send to the API. The system prompt now includes a compressed summary, and the recent messages are raw. The trade-off? The summary can lose nuance. But it’s good enough for 90% of use cases.

Here’s how I hook it into an actual OpenAI-compatible API (I used the endpoint from ai.interwestinfo.com

in my config):

import openai

context = ContextManager(max_recent=6)
user_input = "What were we discussing about the book?"
context.add_message("user", user_input)

messages = context.build_context("You are a helpful assistant.")

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=messages,
    api_base="https://ai.interwestinfo.com/v1"  # my custom endpoint
)
assistant_reply = response.choices[0].message.content
context.add_message("assistant", assistant_reply)

This pattern worked for me. The bot now remembers key points from ten minutes ago, and I’m not bankrupting on tokens.

max_recent

, threshold for summarization, and summary length are all knobs you can turn. Start small and increase until you meet your quality/cost balance.If I were to start over, I’d build the summarization step as an async background job. Right now, the _maybe_summarize

call blocks the main thread when it triggers. Not a big deal for a CLI assistant, but for a web app with many concurrent users, that’s a problem.

I’d also pre-validate the summary length against the model’s token limit. In my current version, the summary can grow beyond the system prompt slot, causing the API to truncate the recent messages. I need to enforce a token budget.

Finally, I’d make the syncing with a database explicit. Right now the context is in-memory. If the server restarts, the assistant forgets everything. A simple Redis store would fix that.

I’m curious how other devs solve this. Do you use a fixed token window? A vector store? Or do you rely on the model’s internal memory (and pay the price)? Let me know in the comments—I’d love to compare notes.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/i-almost-gave-up-on-…] indexed:0 read:4min 2026-06-13 ·