cd /news/large-language-models/building-a-slack-bot-that-actually-r… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-41185] src=dev.to β†— pub= topic=large-language-models verified=true sentiment=↑ positive

Building a Slack Bot That Actually Remembers: slacktag-oss

A developer built slacktag-oss, an open-source Slack bot with persistent semantic memory powered by any LLM and Mem0's managed memory layer, eliminating the need for a vector database. The bot uses Mem0's cloud service for stateful memory, allowing stateless bot processes that can be restarted without losing context. The project aims to replicate the conversational continuity of commercial tools like Claude Tag while remaining provider-agnostic.

read8 min views1 publishedJun 26, 2026

How I built an open-source Slack assistant with persistent semantic memory, powered by any LLM and Mem0's managed memory layer β€” no vector database required.

Most AI Slack bots have the memory of a goldfish. Every conversation starts from scratch. You ask it about your sprint goals, it gives a great answer, then three days later you ask a follow-up and it has no idea what you're talking about. You end up re-explaining context constantly.

The commercial solution to this is Claude Tag β€” a Slack integration that maintains genuine conversational continuity. But it's tied to one provider and not open-source.

slacktag-oss

is our attempt to replicate that experience: a Slack bot with real, semantic, persistent memory that works with any LLM β€” including ones running entirely on your laptop.

A Python Slack bot with:

!clear

and !memory

commandsBefore diving into code, here's the full request lifecycle:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         Slack                               β”‚
β”‚  @mention in channel  ──┐                                   β”‚
β”‚  DM to bot            ──┼──► Slack Events API               β”‚
β”‚  Thread reply         β”€β”€β”˜         β”‚                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                    β”‚ (Socket Mode / HTTP)
                                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      slack-bolt (Python)                     β”‚
β”‚   bot.py  ──►  router.py  ──►  handler.py                  β”‚
β”‚                                    β”‚                        β”‚
β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€                        β”‚
β”‚                    β”‚               β”‚                        β”‚
β”‚                    β–Ό               β–Ό                        β”‚
β”‚              Mem0 Client      LangChain                     β”‚
β”‚              (managed)        ChatOpenAI                    β”‚
└────────────────────────────────────────────────────────────-β”˜
                    β”‚
                    β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   Mem0 Managed Cloud  β”‚
        β”‚  Vector Embeddings    β”‚
        β”‚  Entity Extraction    β”‚
        β”‚  Deduplication        β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The key design decision: Mem0 is the only stateful dependency. There's no database to manage, no Redis, no Qdrant. The bot process itself is stateless β€” you can restart it freely without losing any memory.

slacktag-oss/
β”œβ”€β”€ main.py
β”œβ”€β”€ config/settings.py       ← Pydantic settings from .env
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ bot.py               ← Slack Bolt app + event registration
β”‚   β”œβ”€β”€ handler.py           ← All orchestration logic lives here
β”‚   └── router.py            ← Dispatches channel mentions vs DMs
β”œβ”€β”€ memory/
β”‚   β”œβ”€β”€ base.py              ← Abstract interface
β”‚   β”œβ”€β”€ channel_memory.py    ← Channel + thread scoped memory
β”‚   β”œβ”€β”€ dm_memory.py         ← Per-user private memory
β”‚   └── mem0_store.py        ← Mem0 client factory
β”œβ”€β”€ llm/client.py            ← ChatOpenAI factory
└── tools/registry.py        ← Tool plugin stub (v2)

The typical approach to bot memory is a rolling window: keep the last N messages in the prompt. This breaks down fast β€” context gets stale, important things fall out of the window, and token costs grow linearly.

Mem0 takes a different approach. When you store a conversation, it:

When you later ask a question, you get back the most relevant past memories β€” not just the most recent ones. A user's preference mentioned three weeks ago will surface when relevant, even if hundreds of messages happened in between.

Because we're using Mem0's managed cloud, the entire backend is three lines:

from mem0 import MemoryClient
from config.settings import settings

def get_mem0_client() -> MemoryClient:
    return MemoryClient(api_key=settings.MEM0_API_KEY)

No vector database config. No embedding model to choose. No collection names to manage.

The key insight for a Slack bot is that different conversations need different memory boundaries:

def scope_id(self, channel_id: str, thread_ts: str = None) -> str:
    if thread_ts:
        return f"thread:{channel_id}:{thread_ts}"   # isolated thread
    return f"channel:{channel_id}"                   # shared channel

def scope_id(self, user_id: str) -> str:
    return f"dm:{user_id}"                           # private per user

Mem0 uses this string as a user_id

β€” anything stored under channel:C12345

is shared by everyone in that channel. Anything under dm:U67890

is private. Thread memory is completely isolated so a debugging session in a thread doesn't pollute the main channel's memory.

Both ChannelMemory

and DMMemory

implement the same four-method interface:

class BaseMemory(ABC):
    @abstractmethod
    def add(self, messages: list[dict], scope_id: str) -> None: ...

    @abstractmethod
    def search(self, query: str, scope_id: str) -> list[dict]: ...

    @abstractmethod
    def get_all(self, scope_id: str) -> list[dict]: ...

    @abstractmethod
    def clear(self, scope_id: str) -> None: ...

This makes it easy to swap backends later β€” implement BaseMemory

, update the factory, done.

from langchain_openai import ChatOpenAI
from config.settings import settings

def get_llm() -> ChatOpenAI:
    return ChatOpenAI(
        base_url=settings.LLM_BASE_URL,
        api_key=settings.LLM_API_KEY,
        model=settings.LLM_MODEL,
        temperature=0.7,
        streaming=True,
    )

base_url

is the only thing that changes between providers. Ollama, LM Studio, OpenAI, Groq, Together AI β€” all work without touching any other code.

handler.py

is the heart of the bot. For every request, it:

def handle_channel_mention(channel_id, user_id, text, thread_ts=None):
    scope = channel_memory.scope_id(channel_id, thread_ts)

    if text.strip() == "!clear":
        channel_memory.clear(scope)
        return "Memory cleared."
    if text.strip() == "!memory":
        return format_memories(channel_memory.get_all(scope))

    relevant = channel_memory.search(text, scope)
    history  = channel_memory.get_all(scope)

    messages = build_messages(system_prompt, relevant, history, text)
    response = llm.invoke(messages)
    reply    = response.content

    channel_memory.add(
        [{"role": "user", "content": text},
         {"role": "assistant", "content": reply}],
        scope,
    )
    return reply

The message list passed to the LLM is assembled in a specific order:

def build_messages(system_prompt, relevant_memories, recent_history, user_input):
    messages = [SystemMessage(content=system_prompt)]

    if relevant_memories:
        memory_context = "\n".join(
            m["memory"] for m in relevant_memories if "memory" in m
        )
        messages.append(SystemMessage(
            content=f"Relevant context from earlier:\n{memory_context}"
        ))

    for entry in recent_history[-MAX_HISTORY_MESSAGES:]:
        if entry.get("role") == "user":
            messages.append(HumanMessage(content=entry["content"]))
        elif entry.get("role") == "assistant":
            messages.append(AIMessage(content=entry["content"]))

    messages.append(HumanMessage(content=user_input))
    return messages

The two-system-message pattern keeps the bot's persona and instructions separate from the injected memory context β€” cleaner for the model to reason about.

slack-bolt

makes event handling clean:

app = App(token=settings.SLACK_BOT_TOKEN, signing_secret=settings.SLACK_SIGNING_SECRET)

@app.event("app_mention")
def on_mention(event, say):
    route_mention(event, say)   # channel / thread flow

@app.event("message")
def on_message(event, say):
    if event.get("channel_type") == "im" and not event.get("bot_id"):
        route_dm(event, say)    # DM flow, ignore bot's own messages

router.py

extracts the relevant fields and calls the appropriate handler:

def route_mention(event, say):
    channel_id = event.get("channel")
    thread_ts  = event.get("thread_ts")
    text       = event.get("text", "")

    reply = handle_channel_mention(channel_id, event["user"], text, thread_ts)
    say(text=reply, thread_ts=thread_ts or event["ts"])

Replies always go back to the same thread β€” if the mention was in a thread, the bot stays in that thread.

All config lives in one place with validation:

class Settings(BaseSettings):
    SLACK_BOT_TOKEN: str
    SLACK_APP_TOKEN: str
    SLACK_SIGNING_SECRET: str
    LLM_BASE_URL: str = "http://localhost:11434/v1"
    LLM_API_KEY: str = "ollama"
    LLM_MODEL: str = "llama3.2"
    MEM0_API_KEY: str
    BOT_NAME: str = "Claude"
    MAX_HISTORY_MESSAGES: int = 20
    SYSTEM_PROMPT: str = ""

    class Config:
        env_file = ".env"

Missing required fields (the Slack tokens, the Mem0 key) raise a ValidationError

at startup β€” fail fast before any event processing begins.

pip install -r requirements.txt

python main.py

That's it. No Docker, no Qdrant, no ngrok. Invite the bot to a channel, @mention

it, and it starts building memory from the first message.

Here's a realistic example. Day 1:

User:@slacktag Our API rate limit is 100 req/min per tenant. Keep that in mind for capacity planning.

Bot:Got it. I'll factor that in for any capacity discussions.

Day 3 (hundreds of messages later in the channel):

User:@slacktag We're about to onboard 5 new enterprise tenants. Any concerns?

Bot:A few things to consider: with your current API rate limit of 100 req/min per tenant, 5 new enterprise tenants could significantly increase peak load. You may want to review your rate limiting strategy before onboarding...

Mem0 surfaced the rate limit fact from Day 1 because it was semantically relevant to the capacity question β€” even though it was nowhere in the recent message window.

For production, swap SocketModeHandler

for a standard HTTP adapter:

from slack_bolt.adapter.flask import SlackRequestHandler
from flask import Flask, request

flask_app = Flask(__name__)
handler = SlackRequestHandler(app)

@flask_app.route("/slack/events", methods=["POST"])
def events():
    return handler.handle(request)

Point your Slack app's Request URL to https://your-domain/slack/events

, deploy anywhere (Fly.io, Railway, Cloud Run β€” all work), and you're done. No state in the server β€” Mem0 holds everything.

A few extensions that would make this significantly more powerful:

Pluggable tools β€” tools/registry.py

is stubbed out for LangChain tool integration. Adding web search (Tavily, Brave Search) or a code execution sandbox would turn this into a capable agent.

Mem0 graph memory β€” Mem0 supports a graph mode that tracks relationships between entities across conversations. You could map out who's on which team, what projects are in flight, and surface that context automatically.

Per-channel LLM config β€” let admins set a different model per channel (e.g., a powerful model for #architecture, a fast cheap model for #random).

Reaction triggers β€” react with 🧠 to explicitly add a message to memory; react with πŸ—‘οΈ to remove a fact. Much more controllable than pure auto-extraction.

** !summarize** β€” call

mem0.get_all()

and ask the LLM to produce a readable summary of everything it knows about this channel.The codebase is intentionally small. handler.py

is ~100 lines. Every module does one thing. If you want to contribute:

git clone https://github.com/harishkotra/slacktag-oss
cd slacktag-oss
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env

Pick any feature from the table in the README, implement it, and open a PR. The architecture is designed to stay simple β€” add without entangling.

── more in #large-language-models 4 stories Β· sorted by recency
── more on @mem0 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/building-a-slack-bot…] indexed:0 read:8min 2026-06-26 Β· β€”