I almost gave up on my AI assistant — here’s how I fixed context handling

wpnews.pro

cd /news/artificial-intelligence/i-almost-gave-up-on-my-ai-assistant-… · home › topics › artificial-intelligence › article

[ARTICLE · art-26075] src=dev.to ↗ pub=2026-06-13T10:00Z topic=artificial-intelligence verified=true sentiment=· neutral

I almost gave up on my AI assistant — here’s how I fixed context handling

A developer built a personal AI assistant but struggled with context handling as conversations grew longer. They implemented a hierarchical context management system that keeps recent messages raw and periodically summarizes older history, solving token limits and memory issues.

read4 min views20 publishedJun 13, 2026

I’ve been building a personal AI assistant for the past few months. You know the kind: you chat with it, it remembers what you said, and it helps with tasks like summarizing emails, answering questions about your notes, or just being a sounding board.

It started as a weekend project. A few Python scripts, an OpenAI-compatible API endpoint, and a simple loop in the terminal. I was smug. "Look, I built an AI!" But then things got ugly.

The moment I started having longer conversations, the bot became useless. It would forget what I said three messages ago, contradict itself, or start repeating the same advice. I was throwing more and more tokens at the API, and my wallet was crying. Something had to change.

My first attempt was trivial: just append every new message to a list and send the whole history as the messages

array to the API. That worked… for about 10 exchanges. Then token limits kicked in. The API started truncating the oldest messages, breaking the conversation flow.

I tried a sliding window approach—keep only the last N messages. Better, but the assistant lost the long-term context. If I asked it to "remind me of that book I mentioned yesterday," it had no idea. I was essentially lobotomizing my bot every few turns.

Another dead end was summarizing earlier parts of the conversation on every turn. That worked technically, but it added latency and cost. Each turn, I had to re-summarize the entire history. Not sustainable.

I needed a system that could:

This turned out to be a well-known pattern in conversational AI: hierarchical context management. I just didn't know the name then.

Here’s the high-level design:

[Messages]
  ├─ Recent (last 5-10 messages) → passed raw to the API
  └─ Older history → periodically summarized into a static summary string

The key insight is that you don’t need to summarize after every message. You only need to rotate the summary when the conversation has grown enough to push out important content. For my use case, I set a threshold: once the recent window exceeds 6 messages AND the oldest message in that window is older than X minutes, I trigger a summarization.

Here’s the Python class that implements this:

import time
from typing import List, Dict, Optional

class ContextManager:
    def __init__(self, max_recent: int = 6, summary: str = ""):
        self.max_recent = max_recent
        self.summary = summary
        self.recent_messages: List[Dict] = []
        self.last_summary_time = time.time()

    def add_message(self, role: str, content: str):
        self.recent_messages.append({"role": role, "content": content})
        if len(self.recent_messages) > self.max_recent:
            self._maybe_summarize()

    def _maybe_summarize(self):
        if time.time() - self.last_summary_time < 60:
            return
        older = self.recent_messages[:-self.max_recent + 2]  # keep last 2 raw
        if older:
            new_summary = self._summarize_messages(older)
            self.summary = new_summary if new_summary else self.summary
            self.recent_messages = self.recent_messages[-self.max_recent + 2:]
            self.last_summary_time = time.time()

    def _summarize_messages(self, msgs: List[Dict]) -> str:
        text = "\n".join(m["content"] for m in msgs)
        return text[:500] if len(text) > 500 else text

    def build_context(self, system_prompt: str) -> List[Dict]:
        system = {"role": "system", "content": f"{system_prompt}\nSummary of earlier conversation: {self.summary}"}
        return [system] + self.recent_messages

This class builds the context array that you send to the API. The system prompt now includes a compressed summary, and the recent messages are raw. The trade-off? The summary can lose nuance. But it’s good enough for 90% of use cases.

Here’s how I hook it into an actual OpenAI-compatible API (I used the endpoint from ai.interwestinfo.com

in my config):

import openai

context = ContextManager(max_recent=6)
user_input = "What were we discussing about the book?"
context.add_message("user", user_input)

messages = context.build_context("You are a helpful assistant.")

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=messages,
    api_base="https://ai.interwestinfo.com/v1"  # my custom endpoint
)
assistant_reply = response.choices[0].message.content
context.add_message("assistant", assistant_reply)

This pattern worked for me. The bot now remembers key points from ten minutes ago, and I’m not bankrupting on tokens.

max_recent

, threshold for summarization, and summary length are all knobs you can turn. Start small and increase until you meet your quality/cost balance.If I were to start over, I’d build the summarization step as an async background job. Right now, the _maybe_summarize

call blocks the main thread when it triggers. Not a big deal for a CLI assistant, but for a web app with many concurrent users, that’s a problem.

I’d also pre-validate the summary length against the model’s token limit. In my current version, the summary can grow beyond the system prompt slot, causing the API to truncate the recent messages. I need to enforce a token budget.

Finally, I’d make the syncing with a database explicit. Right now the context is in-memory. If the server restarts, the assistant forgets everything. A simple Redis store would fix that.

I’m curious how other devs solve this. Do you use a fixed token window? A vector store? Or do you rely on the model’s internal memory (and pay the price)? Let me know in the comments—I’d love to compare notes.

source & further reading

dev.to — original article Your RAG Index Might Be Lying to You: Data Freshness Is the Missing Signal for AI Systems Docker returns to its coding-agent series with an argument shaped like a CI problem: no layer between the agent and the host Prompt injection has two types. You're probably only filtering one.

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-almost-gave-up-on-my-a…

Read original on dev.to → dev.to/__c1b9e06dc90a7e0a676b/i-almost-gave-up-o…

mentioned entities

OpenAI

metadata

slugi-almost-gave-up-on-my-ai-assistant-heres-how-i-fixed-context-handling

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevApple Didn't Really Say What iOS…

next →Anthropic suspends Fable 5 and M…

── more in #artificial-intelligence 4 stories · sorted by recency

artificialconfidence.com · 29 Jul · #artificial-intelligence

Reading the tea leaves: 2026 predictions

dev.to · 29 Jul · #artificial-intelligence

Your RAG Index Might Be Lying to You: Data Freshness Is the Missing Signal for AI Systems

cio.com · 29 Jul · #artificial-intelligence

What every CIO needs to know about platform engineering in the age of AI

latent.space · 29 Jul · #artificial-intelligence

[AINews] Fearing RSI: OpenAI, Anthropic, GDM, Meta, Thinky cosign letter to "Pace" AI development, as HuggingFace details Machine-Speed Offensive Cyberattack

── more on @openai 3 stories trending now

wpnews · 16 Jul · #artificial-intelligence

Women entrepreneurs are less likely to leverage AI—but more likely to benefit from it

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

wpnews · 28 Jul · #artificial-intelligence

How Claude Code and VS Code turned Anthropic from a safety lab into a developer phenomenon

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required