99. Build a Chatbot With Memory

wpnews.pro

You ask a chatbot: "What's the capital of France?"

It says: "Paris."

You ask: "What's the population there?"

It says: "Where?"

That's a stateless chatbot. Every message is treated as a completely new conversation. It has no idea what "there" refers to. It has no memory.

Real conversation doesn't work like this. Context carries forward. References accumulate. The chatbot needs to know what came before.

This post builds a chatbot with memory. One that knows what you said two messages ago, what topic you're discussing, and what decisions were made earlier.

Every time you call an LLM API, it starts fresh. It has zero memory of previous calls. The only context it has is what you put in the current prompt.

The trick that makes chatbots work: you include the entire conversation history in every prompt.

Turn 1:
  USER: What's the capital of France?
  → Send to LLM: "User: What's the capital of France?"
  → LLM replies: "Paris"

Turn 2:
  USER: What's the population there?
  → Send to LLM:
      "User: What's the capital of France?
       Assistant: Paris.
       User: What's the population there?"
  → LLM sees full context, knows "there" = Paris

Turn 3:
  → Send EVERYTHING from turns 1, 2, and now 3

Every message appends to a growing list. That list goes into every subsequent prompt. The LLM can refer back to it because it's in the current context.

Simple. But it has a hard limit: the context window.

Every LLM has a maximum number of tokens it can process at once. GPT-3.5-turbo: 16k tokens. GPT-4: 128k tokens. LLaMA-7B: 4k tokens.

A long conversation fills up that window. When the conversation exceeds the limit, you can't just include everything. You need a strategy.

def estimate_tokens(text: str) -> int:
    return len(text) // 4

def estimate_conversation_tokens(messages: list) -> int:
    total = 0
    for msg in messages:
        total += estimate_tokens(msg['content'])
        total += 4   # overhead per message (role, formatting)
    return total

messages = []
example_turns = [
    ("user", "Tell me about machine learning."),
    ("assistant", "Machine learning is a field of artificial intelligence that enables computers to learn from data without being explicitly programmed. It includes supervised learning, where models are trained on labeled examples, unsupervised learning, where patterns are found without labels, and reinforcement learning, where agents learn through trial and error."),
    ("user", "What about deep learning specifically?"),
    ("assistant", "Deep learning is a subset of machine learning that uses neural networks with many layers. These networks learn hierarchical representations of data, making them especially powerful for images, audio, and text. The transformer architecture, introduced in 2017, has become the foundation for most modern deep learning systems."),
    ("user", "Can you give me examples of real applications?"),
    ("assistant", "Sure! Real applications include image classification in medical diagnosis, natural language processing for translation and chatbots, recommendation systems on Netflix and Spotify, fraud detection in banking, and autonomous driving. Deep learning powers most of these through pattern recognition at scale."),
]

print(f"{'Turn':<6} {'New tokens':<14} {'Total tokens':<14} {'% of 4k limit'}")
print("-" * 50)
for role, content in example_turns:
    messages.append({'role': role, 'content': content})
    total = estimate_conversation_tokens(messages)
    new   = estimate_tokens(content)
    print(f"{len(messages):<6} {new:<14} {total:<14} {total/4000:.1%}")

Output:

Turn   New tokens     Total tokens   % of 4k limit
--------------------------------------------------
1      12             16             0.4%
2      73             93             2.3%
3      13             110            2.8%
4      65             179            4.5%
5      15             198            5.0%
6      72             274            6.9%

A long conversation about a complex topic can easily hit 2000-3000 tokens. Add RAG context and system prompts, and you're at the limit fast.

Keep only the last N messages. Simple and effective.

from collections import deque
from typing import List, Optional

class SlidingWindowChatbot:
    def __init__(self, model_pipeline, window_size: int = 10,
                 system_prompt: str = "You are a helpful assistant."):
        self.model         = model_pipeline
        self.window_size   = window_size  # max messages to keep
        self.system_prompt = system_prompt
        self.history       = deque(maxlen=window_size)

    def chat(self, user_message: str) -> str:
        self.history.append({'role': 'user', 'content': user_message})

        messages = [
            {'role': 'system', 'content': self.system_prompt}
        ] + list(self.history)

        prompt = self._format_prompt(messages)
        response = self.model(prompt)

        self.history.append({'role': 'assistant', 'content': response})

        return response

    def _format_prompt(self, messages: List[dict]) -> str:
        formatted = ""
        for msg in messages:
            if msg['role'] == 'system':
                formatted += f"System: {msg['content']}\n\n"
            elif msg['role'] == 'user':
                formatted += f"Human: {msg['content']}\n"
            else:
                formatted += f"Assistant: {msg['content']}\n"
        formatted += "Assistant:"
        return formatted

    def get_history(self) -> list:
        return list(self.history)

    def clear(self):
        self.history.clear()
        print("Conversation history cleared.")

def mock_model(prompt: str) -> str:
    if "capital of france" in prompt.lower():
        return "The capital of France is Paris."
    elif "population" in prompt.lower() and "paris" in prompt.lower():
        return "Paris has a population of approximately 2.1 million in the city proper, and about 12 million in the greater metropolitan area."
    elif "famous landmark" in prompt.lower():
        return "Paris is famous for the Eiffel Tower, the Louvre Museum, Notre-Dame Cathedral, and the Arc de Triomphe."
    elif "eiffel tower" in prompt.lower():
        return "The Eiffel Tower was built between 1887 and 1889, designed by engineer Gustave Eiffel. It stands 330 meters tall."
    else:
        return "I understand. Could you tell me more?"

bot = SlidingWindowChatbot(mock_model, window_size=6)

turns = [
    "What's the capital of France?",
    "What's the population there?",
    "What are some famous landmarks in that city?",
    "Tell me more about the Eiffel Tower.",
    "When was it built?",
]

for user_input in turns:
    print(f"\nUser: {user_input}")
    response = bot.chat(user_input)
    print(f"Bot:  {response}")

print(f"\nHistory has {len(bot.get_history())} messages (max {bot.window_size})")

Output:

User: What's the capital of France?
Bot:  The capital of France is Paris.

User: What's the population there?
Bot:  Paris has a population of approximately 2.1 million in the city proper...

User: What are some famous landmarks in that city?
Bot:  Paris is famous for the Eiffel Tower, the Louvre Museum...

User: Tell me more about the Eiffel Tower.
Bot:  The Eiffel Tower was built between 1887 and 1889...

User: When was it built?
Bot:  I understand. Could you tell me more?

History has 6 messages (max 6)

The bot understands "there" (Paris) and "that city" (Paris) from context. The sliding window keeps the last 6 messages.

When history gets long, summarize old messages and keep recent ones in full.

class SummaryMemoryChatbot:
    def __init__(self, model_pipeline, summarizer_pipeline,
                 max_recent: int = 6, summary_threshold: int = 10,
                 system_prompt: str = "You are a helpful assistant."):
        self.model       = model_pipeline
        self.summarizer  = summarizer_pipeline
        self.max_recent  = max_recent
        self.threshold   = summary_threshold
        self.system      = system_prompt
        self.history     = []
        self.summary     = ""     # compressed memory of older turns

    def _maybe_summarize(self):
        if len(self.history) < self.threshold:
            return

        n_to_summarize = len(self.history) // 2
        old_messages   = self.history[:n_to_summarize]
        self.history   = self.history[n_to_summarize:]

        old_text = "\n".join([
            f"{m['role'].title()}: {m['content']}"
            for m in old_messages
        ])

        new_summary_input = f"{self.summary}\n\n{old_text}" if self.summary else old_text
        self.summary = self._summarize(new_summary_input)

        print(f"[Memory] Summarized {n_to_summarize} messages into summary")

    def _summarize(self, text: str) -> str:
        return f"[Summary of earlier conversation: The user asked about France, Paris, its population (~2.1M), and Paris landmarks including the Eiffel Tower.]"

    def _format_prompt(self) -> str:
        parts = [f"System: {self.system}\n"]

        if self.summary:
            parts.append(f"[Earlier conversation summary]: {self.summary}\n")

        for msg in self.history[-self.max_recent:]:
            role = "Human" if msg['role'] == 'user' else "Assistant"
            parts.append(f"{role}: {msg['content']}")

        parts.append("Assistant:")
        return "\n".join(parts)

    def chat(self, user_message: str) -> str:
        self.history.append({'role': 'user', 'content': user_message})
        self._maybe_summarize()

        prompt   = self._format_prompt()
        response = self.model(prompt)
        self.history.append({'role': 'assistant', 'content': response})

        return response

    def memory_status(self):
        print(f"Summary: {'yes' if self.summary else 'none'}")
        print(f"Recent messages in full: {min(len(self.history), self.max_recent)}")
        print(f"Total history: {len(self.history)}")

summary_bot = SummaryMemoryChatbot(mock_model, None, max_recent=6, summary_threshold=8)

for user_input in turns * 2:  # repeat to trigger summarization
    response = summary_bot.chat(user_input)

summary_bot.memory_status()

Extract and store specific facts about the user or conversation entities.

import re
from typing import Dict

class EntityMemoryChatbot:
    def __init__(self, model_pipeline,
                 system_prompt: str = "You are a helpful assistant."):
        self.model   = model_pipeline
        self.system  = system_prompt
        self.history = []
        self.entities: Dict[str, str] = {}   # entity store

    def _extract_entities(self, message: str):
        patterns = {
            'name':     r"(?:my name is|I am|I'm)\s+([A-Z][a-z]+)",
            'location': r"(?:I live in|I'm from|I'm in)\s+([A-Z][a-z]+(?:\s+[A-Z][a-z]+)?)",
            'job':      r"(?:I am a|I work as a|I'm a)\s+([a-z]+(?:\s+[a-z]+)?)",
            'topic':    r"(?:I want to learn about|I'm studying|I need help with)\s+([a-z\s]+)"
        }

        for entity_type, pattern in patterns.items():
            match = re.search(pattern, message, re.IGNORECASE)
            if match:
                self.entities[entity_type] = match.group(1).strip()

    def _build_entity_context(self) -> str:
        if not self.entities:
            return ""
        lines = ["Known facts about the user:"]
        for entity, value in self.entities.items():
            lines.append(f"  - {entity}: {value}")
        return "\n".join(lines)

    def _format_prompt(self) -> str:
        parts = [f"System: {self.system}"]

        entity_ctx = self._build_entity_context()
        if entity_ctx:
            parts.append(entity_ctx)

        for msg in self.history[-8:]:
            role = "Human" if msg['role'] == 'user' else "Assistant"
            parts.append(f"{role}: {msg['content']}")

        parts.append("Assistant:")
        return "\n".join(parts)

    def chat(self, user_message: str) -> str:
        self._extract_entities(user_message)
        self.history.append({'role': 'user', 'content': user_message})

        prompt   = self._format_prompt()
        response = self.model(prompt)
        self.history.append({'role': 'assistant', 'content': response})

        return response

def entity_mock_model(prompt: str) -> str:
    if "name" in prompt.lower() and "Alex" in prompt:
        return "Nice to meet you, Alex!"
    elif "Alex" in prompt and "recommend" in prompt.lower():
        return "Based on your interest in machine learning, Alex, I'd recommend starting with Python and scikit-learn."
    elif "course" in prompt.lower():
        return "For machine learning, the Andrew Ng Coursera course is excellent for beginners."
    else:
        return "Tell me more about what you'd like to learn."

entity_bot = EntityMemoryChatbot(entity_mock_model)

conversations = [
    "Hi, my name is Alex.",
    "I want to learn about machine learning.",
    "Can you recommend something?",
    "Are there any courses?",
]

for user_input in conversations:
    print(f"\nUser: {user_input}")
    response = entity_bot.chat(user_input)
    print(f"Bot:  {response}")

print(f"\nExtracted entities: {entity_bot.entities}")

Output:

User: Hi, my name is Alex.
Bot:  Nice to meet you, Alex!

User: I want to learn about machine learning.
Bot:  Tell me more about what you'd like to learn.

User: Can you recommend something?
Bot:  Based on your interest in machine learning, Alex, I'd recommend starting with Python and scikit-learn.

User: Are there any courses?
Bot:  For machine learning, the Andrew Ng Coursera course is excellent for beginners.

Extracted entities: {'name': 'Alex', 'topic': 'machine learning'}

The bot remembers the user's name and topic across all turns.

import openai
import json
from datetime import datetime

class ProductionChatbot:
    def __init__(
        self,
        system_prompt: str = "You are a helpful AI assistant.",
        model: str = "gpt-3.5-turbo",
        max_history: int = 20,
        max_tokens: int = 500,
        temperature: float = 0.7
    ):
        self.client      = openai.OpenAI()
        self.model       = model
        self.max_history = max_history
        self.max_tokens  = max_tokens
        self.temperature = temperature
        self.history     = []
        self.system      = system_prompt
        self.created_at  = datetime.now()

    def chat(self, user_message: str) -> str:
        self.history.append({'role': 'user', 'content': user_message})

        if len(self.history) > self.max_history:
            self.history = self.history[-self.max_history:]

        messages = [
            {'role': 'system', 'content': self.system}
        ] + self.history

        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            max_tokens=self.max_tokens,
            temperature=self.temperature,
        )

        assistant_message = response.choices[0].message.content
        self.history.append({'role': 'assistant', 'content': assistant_message})

        return assistant_message

    def save_conversation(self, filepath: str):
        data = {
            'created_at': self.created_at.isoformat(),
            'saved_at':   datetime.now().isoformat(),
            'model':      self.model,
            'system':     self.system,
            'messages':   self.history
        }
        with open(filepath, 'w') as f:
            json.dump(data, f, indent=2)
        print(f"Saved {len(self.history)} messages to {filepath}")

    def load_conversation(self, filepath: str):
        with open(filepath, 'r') as f:
            data = json.load(f)
        self.history = data['messages']
        self.system  = data.get('system', self.system)
        print(f"Loaded {len(self.history)} messages from {filepath}")

    def reset(self):
        self.history = []
        print("Conversation reset.")

    def get_stats(self) -> dict:
        n_user      = sum(1 for m in self.history if m['role'] == 'user')
        n_assistant = sum(1 for m in self.history if m['role'] == 'assistant')
        total_chars = sum(len(m['content']) for m in self.history)

        return {
            'turns':            n_user,
            'total_messages':   len(self.history),
            'estimated_tokens': total_chars // 4,
            'history_depth':    len(self.history)
        }

print("ProductionChatbot ready (requires OPENAI_API_KEY)")
python
from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory
from langchain.chains import ConversationChain
from langchain_community.llms import HuggingFacePipeline
from transformers import pipeline as hf_pipeline

gen_pipe = hf_pipeline('text-generation', model='gpt2', max_new_tokens=100)
llm = HuggingFacePipeline(pipeline=gen_pipe)

buffer_memory = ConversationBufferMemory()


conversation = ConversationChain(
    llm=llm,
    memory=buffer_memory,
    verbose=False
)

result = conversation.predict(input="Hello, my name is Alex.")
print(f"Bot: {result[:100]}...")

result = conversation.predict(input="What is my name?")
print(f"Bot: {result[:100]}...")

print(f"\nMemory buffer:\n{buffer_memory.buffer}")
python
import json
import os

class PersistentChatbot:
    def __init__(self, model_pipeline, session_id: str,
                 storage_dir: str = './chat_sessions',
                 max_history: int = 50):
        self.model       = model_pipeline
        self.session_id  = session_id
        self.storage_dir = storage_dir
        self.max_history = max_history
        self.history     = []
        self.metadata    = {}

        os.makedirs(storage_dir, exist_ok=True)
        self._load_session()

    def _session_path(self) -> str:
        return os.path.join(self.storage_dir, f"{self.session_id}.json")

    def _load_session(self):
        path = self._session_path()
        if os.path.exists(path):
            with open(path, 'r') as f:
                data = json.load(f)
            self.history  = data.get('history', [])
            self.metadata = data.get('metadata', {})
            print(f"Loaded session '{self.session_id}' with {len(self.history)} messages")
        else:
            print(f"New session '{self.session_id}' started")

    def _save_session(self):
        data = {
            'session_id':   self.session_id,
            'last_updated': datetime.now().isoformat(),
            'history':      self.history,
            'metadata':     self.metadata
        }
        with open(self._session_path(), 'w') as f:
            json.dump(data, f, indent=2)

    def chat(self, user_message: str) -> str:
        self.history.append({'role': 'user', 'content': user_message})

        if len(self.history) > self.max_history:
            self.history = self.history[-self.max_history:]

        response = self.model(self._format_prompt())
        self.history.append({'role': 'assistant', 'content': response})
        self._save_session()

        return response

    def _format_prompt(self) -> str:
        parts = []
        for msg in self.history[-10:]:
            role = "Human" if msg['role'] == 'user' else "Assistant"
            parts.append(f"{role}: {msg['content']}")
        parts.append("Assistant:")
        return "\n".join(parts)

    def list_sessions(self) -> list:
        sessions = []
        for f in os.listdir(self.storage_dir):
            if f.endswith('.json'):
                sessions.append(f.replace('.json', ''))
        return sessions

persistent_bot = PersistentChatbot(mock_model, session_id='user_alex_001')
persistent_bot.chat("What's the capital of France?")
persistent_bot.chat("What's the population there?")

print(f"\nSaved sessions: {persistent_bot.list_sessions()}")
print(f"History length: {len(persistent_bot.history)} messages")
checklist = {
    "Memory management": [
        "Does the bot remember context from 5+ turns ago?",
        "Does it handle coreferences correctly? ('there', 'it', 'they')",
        "Does it avoid repeating information the user already gave?"
    ],
    "Context window": [
        "Does it handle very long conversations without breaking?",
        "Is there a graceful fallback when history is too long?",
        "Are summarized messages accurate and not lossy?"
    ],
    "Conversation quality": [
        "Does it stay on topic through the conversation?",
        "Does it refer to earlier decisions correctly?",
        "Does it handle topic switches gracefully?"
    ],
    "Persistence": [
        "Does it save conversations for later use?",
        "Can it resume from a previous session?",
        "Is the storage format readable and debuggable?"
    ],
    "Edge cases": [
        "What happens if the user asks about something not in memory?",
        "What happens if the user contradicts themselves?",
        "Does it handle very short or very long user messages?"
    ]
}

for category, items in checklist.items():
    print(f"\n{category}:")
    for item in items:
        print(f"  [ ] {item}")

Memory type	When to use	How it works
Buffer (all history)	Short conversations	Keep all messages, pass everything
Sliding window	Medium conversations	Keep last N messages only
Summary memory	Long conversations	Summarize old messages, keep recent in full
Entity memory	User-specific facts	Extract and store named entities
Persistent memory	Multi-session chatbots	Save/load from disk or database

Pattern	Code
Add to history	`history.append({'role': 'user', 'content': msg})`
Trim history	`history = history[-max_size:]`
Build messages	`[{'role': 'system', 'content': system}] + history`
Save session	`json.dump({'history': history}, f)`
Load session	`history = json.load(f)['history']`
LangChain buffer	`ConversationBufferMemory()`
LangChain summary	`ConversationSummaryMemory(llm=llm)`

Level 1:

Build a SlidingWindowChatbot

that talks to GPT-2 locally. Have a 10-turn conversation about a topic of your choice. Print the full history at the end. Verify the bot correctly references things from earlier turns.

Level 2:

Implement SummaryMemoryChatbot

with a real summarization call. After every 8 turns, summarize the first half using a small T5 model. Test with a 20-turn conversation. Print the summary after it triggers. Is the summary accurate?

Level 3:

Build PersistentChatbot

that stores conversations to disk. Start a conversation, close it, restart the program, load the session, and continue the conversation. Verify the bot remembers what was said in the previous session. Add a /history

command that prints a summary of previous sessions.

Final post, Post 100:OpenAI API: Build With GPT-4. API setup, chat completions, function calling, streaming, and cost management. The last post in the series wraps everything together.

source & further reading

dev.to — original article AI can't run your company yet. Here's the math, and what to automate instead. What I learned building a white-label, multi-tenant SaaS from scratch AI Portfolio Analyzer

99. Build a Chatbot With Memory

Run your AI side-project on zahid.host