Building a Slack Bot That Actually Remembers: slacktag-oss

wpnews.pro

How I built an open-source Slack assistant with persistent semantic memory, powered by any LLM and Mem0's managed memory layer — no vector database required.

Most AI Slack bots have the memory of a goldfish. Every conversation starts from scratch. You ask it about your sprint goals, it gives a great answer, then three days later you ask a follow-up and it has no idea what you're talking about. You end up re-explaining context constantly.

The commercial solution to this is Claude Tag — a Slack integration that maintains genuine conversational continuity. But it's tied to one provider and not open-source.

slacktag-oss

is our attempt to replicate that experience: a Slack bot with real, semantic, persistent memory that works with any LLM — including ones running entirely on your laptop.

A Python Slack bot with:

!clear

and !memory

commandsBefore diving into code, here's the full request lifecycle:

┌─────────────────────────────────────────────────────────────┐
│                         Slack                               │
│  @mention in channel  ──┐                                   │
│  DM to bot            ──┼──► Slack Events API               │
│  Thread reply         ──┘         │                         │
└───────────────────────────────────│─────────────────────────┘
                                    │ (Socket Mode / HTTP)
                                    ▼
┌─────────────────────────────────────────────────────────────┐
│                      slack-bolt (Python)                     │
│   bot.py  ──►  router.py  ──►  handler.py                  │
│                                    │                        │
│                    ┌───────────────┤                        │
│                    │               │                        │
│                    ▼               ▼                        │
│              Mem0 Client      LangChain                     │
│              (managed)        ChatOpenAI                    │
└────────────────────────────────────────────────────────────-┘
                    │
                    ▼
        ┌───────────────────────┐
        │   Mem0 Managed Cloud  │
        │  Vector Embeddings    │
        │  Entity Extraction    │
        │  Deduplication        │
        └───────────────────────┘

The key design decision: Mem0 is the only stateful dependency. There's no database to manage, no Redis, no Qdrant. The bot process itself is stateless — you can restart it freely without losing any memory.

slacktag-oss/
├── main.py
├── config/settings.py       ← Pydantic settings from .env
├── core/
│   ├── bot.py               ← Slack Bolt app + event registration
│   ├── handler.py           ← All orchestration logic lives here
│   └── router.py            ← Dispatches channel mentions vs DMs
├── memory/
│   ├── base.py              ← Abstract interface
│   ├── channel_memory.py    ← Channel + thread scoped memory
│   ├── dm_memory.py         ← Per-user private memory
│   └── mem0_store.py        ← Mem0 client factory
├── llm/client.py            ← ChatOpenAI factory
└── tools/registry.py        ← Tool plugin stub (v2)

The typical approach to bot memory is a rolling window: keep the last N messages in the prompt. This breaks down fast — context gets stale, important things fall out of the window, and token costs grow linearly.

Mem0 takes a different approach. When you store a conversation, it:

When you later ask a question, you get back the most relevant past memories — not just the most recent ones. A user's preference mentioned three weeks ago will surface when relevant, even if hundreds of messages happened in between.

Because we're using Mem0's managed cloud, the entire backend is three lines:

from mem0 import MemoryClient
from config.settings import settings

def get_mem0_client() -> MemoryClient:
    return MemoryClient(api_key=settings.MEM0_API_KEY)

No vector database config. No embedding model to choose. No collection names to manage.

The key insight for a Slack bot is that different conversations need different memory boundaries:

def scope_id(self, channel_id: str, thread_ts: str = None) -> str:
    if thread_ts:
        return f"thread:{channel_id}:{thread_ts}"   # isolated thread
    return f"channel:{channel_id}"                   # shared channel

def scope_id(self, user_id: str) -> str:
    return f"dm:{user_id}"                           # private per user

Mem0 uses this string as a user_id

— anything stored under channel:C12345

is shared by everyone in that channel. Anything under dm:U67890

is private. Thread memory is completely isolated so a debugging session in a thread doesn't pollute the main channel's memory.

Both ChannelMemory

and DMMemory

implement the same four-method interface:

class BaseMemory(ABC):
    @abstractmethod
    def add(self, messages: list[dict], scope_id: str) -> None: ...

    @abstractmethod
    def search(self, query: str, scope_id: str) -> list[dict]: ...

    @abstractmethod
    def get_all(self, scope_id: str) -> list[dict]: ...

    @abstractmethod
    def clear(self, scope_id: str) -> None: ...

This makes it easy to swap backends later — implement BaseMemory

, update the factory, done.

from langchain_openai import ChatOpenAI
from config.settings import settings

def get_llm() -> ChatOpenAI:
    return ChatOpenAI(
        base_url=settings.LLM_BASE_URL,
        api_key=settings.LLM_API_KEY,
        model=settings.LLM_MODEL,
        temperature=0.7,
        streaming=True,
    )

base_url

is the only thing that changes between providers. Ollama, LM Studio, OpenAI, Groq, Together AI — all work without touching any other code.

handler.py

is the heart of the bot. For every request, it:

def handle_channel_mention(channel_id, user_id, text, thread_ts=None):
    scope = channel_memory.scope_id(channel_id, thread_ts)

    if text.strip() == "!clear":
        channel_memory.clear(scope)
        return "Memory cleared."
    if text.strip() == "!memory":
        return format_memories(channel_memory.get_all(scope))

    relevant = channel_memory.search(text, scope)
    history  = channel_memory.get_all(scope)

    messages = build_messages(system_prompt, relevant, history, text)
    response = llm.invoke(messages)
    reply    = response.content

    channel_memory.add(
        [{"role": "user", "content": text},
         {"role": "assistant", "content": reply}],
        scope,
    )
    return reply

The message list passed to the LLM is assembled in a specific order:

def build_messages(system_prompt, relevant_memories, recent_history, user_input):
    messages = [SystemMessage(content=system_prompt)]

    if relevant_memories:
        memory_context = "\n".join(
            m["memory"] for m in relevant_memories if "memory" in m
        )
        messages.append(SystemMessage(
            content=f"Relevant context from earlier:\n{memory_context}"
        ))

    for entry in recent_history[-MAX_HISTORY_MESSAGES:]:
        if entry.get("role") == "user":
            messages.append(HumanMessage(content=entry["content"]))
        elif entry.get("role") == "assistant":
            messages.append(AIMessage(content=entry["content"]))

    messages.append(HumanMessage(content=user_input))
    return messages

The two-system-message pattern keeps the bot's persona and instructions separate from the injected memory context — cleaner for the model to reason about.

slack-bolt

makes event handling clean:

app = App(token=settings.SLACK_BOT_TOKEN, signing_secret=settings.SLACK_SIGNING_SECRET)

@app.event("app_mention")
def on_mention(event, say):
    route_mention(event, say)   # channel / thread flow

@app.event("message")
def on_message(event, say):
    if event.get("channel_type") == "im" and not event.get("bot_id"):
        route_dm(event, say)    # DM flow, ignore bot's own messages

router.py

extracts the relevant fields and calls the appropriate handler:

def route_mention(event, say):
    channel_id = event.get("channel")
    thread_ts  = event.get("thread_ts")
    text       = event.get("text", "")

    reply = handle_channel_mention(channel_id, event["user"], text, thread_ts)
    say(text=reply, thread_ts=thread_ts or event["ts"])

Replies always go back to the same thread — if the mention was in a thread, the bot stays in that thread.

All config lives in one place with validation:

class Settings(BaseSettings):
    SLACK_BOT_TOKEN: str
    SLACK_APP_TOKEN: str
    SLACK_SIGNING_SECRET: str
    LLM_BASE_URL: str = "http://localhost:11434/v1"
    LLM_API_KEY: str = "ollama"
    LLM_MODEL: str = "llama3.2"
    MEM0_API_KEY: str
    BOT_NAME: str = "Claude"
    MAX_HISTORY_MESSAGES: int = 20
    SYSTEM_PROMPT: str = ""

    class Config:
        env_file = ".env"

Missing required fields (the Slack tokens, the Mem0 key) raise a ValidationError

at startup — fail fast before any event processing begins.

pip install -r requirements.txt

python main.py

That's it. No Docker, no Qdrant, no ngrok. Invite the bot to a channel, @mention

it, and it starts building memory from the first message.

Here's a realistic example. Day 1:

User:@slacktag Our API rate limit is 100 req/min per tenant. Keep that in mind for capacity planning.

Bot:Got it. I'll factor that in for any capacity discussions.

Day 3 (hundreds of messages later in the channel):

User:@slacktag We're about to onboard 5 new enterprise tenants. Any concerns?

Bot:A few things to consider: with your current API rate limit of 100 req/min per tenant, 5 new enterprise tenants could significantly increase peak load. You may want to review your rate limiting strategy before onboarding...

Mem0 surfaced the rate limit fact from Day 1 because it was semantically relevant to the capacity question — even though it was nowhere in the recent message window.

For production, swap SocketModeHandler

for a standard HTTP adapter:

from slack_bolt.adapter.flask import SlackRequestHandler
from flask import Flask, request

flask_app = Flask(__name__)
handler = SlackRequestHandler(app)

@flask_app.route("/slack/events", methods=["POST"])
def events():
    return handler.handle(request)

Point your Slack app's Request URL to https://your-domain/slack/events

, deploy anywhere (Fly.io, Railway, Cloud Run — all work), and you're done. No state in the server — Mem0 holds everything.

A few extensions that would make this significantly more powerful:

Pluggable tools — tools/registry.py

is stubbed out for LangChain tool integration. Adding web search (Tavily, Brave Search) or a code execution sandbox would turn this into a capable agent.

Mem0 graph memory — Mem0 supports a graph mode that tracks relationships between entities across conversations. You could map out who's on which team, what projects are in flight, and surface that context automatically.

Per-channel LLM config — let admins set a different model per channel (e.g., a powerful model for #architecture, a fast cheap model for #random).

Reaction triggers — react with 🧠 to explicitly add a message to memory; react with 🗑️ to remove a fact. Much more controllable than pure auto-extraction.

** !summarize** — call

mem0.get_all()

and ask the LLM to produce a readable summary of everything it knows about this channel.The codebase is intentionally small. handler.py

is ~100 lines. Every module does one thing. If you want to contribute:

git clone https://github.com/harishkotra/slacktag-oss
cd slacktag-oss
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env

Pick any feature from the table in the README, implement it, and open a PR. The architecture is designed to stay simple — add without entangling.

source & further reading

dev.to — original article AI Automations for Local Service Businesses: What Actually Works How We Actually Measure Whether an LLM's Output Is Good - BLEU, COMET and BLEURT Left of the Loop: The End of the Craftsman?

Building a Slack Bot That Actually Remembers: slacktag-oss

Run your AI side-project on zahid.host