Building a Slack Bot That Actually Remembers: slacktag-oss A developer built slacktag-oss, an open-source Slack bot with persistent semantic memory powered by any LLM and Mem0's managed memory layer, eliminating the need for a vector database. The bot uses Mem0's cloud service for stateful memory, allowing stateless bot processes that can be restarted without losing context. The project aims to replicate the conversational continuity of commercial tools like Claude Tag while remaining provider-agnostic. How I built an open-source Slack assistant with persistent semantic memory, powered by any LLM and Mem0's managed memory layer — no vector database required. Most AI Slack bots have the memory of a goldfish. Every conversation starts from scratch. You ask it about your sprint goals, it gives a great answer, then three days later you ask a follow-up and it has no idea what you're talking about. You end up re-explaining context constantly. The commercial solution to this is Claude Tag — a Slack integration that maintains genuine conversational continuity. But it's tied to one provider and not open-source. slacktag-oss is our attempt to replicate that experience: a Slack bot with real, semantic, persistent memory that works with any LLM — including ones running entirely on your laptop. A Python Slack bot with: clear and memory commandsBefore diving into code, here's the full request lifecycle: ┌─────────────────────────────────────────────────────────────┐ │ Slack │ │ @mention in channel ──┐ │ │ DM to bot ──┼──► Slack Events API │ │ Thread reply ──┘ │ │ └───────────────────────────────────│─────────────────────────┘ │ Socket Mode / HTTP ▼ ┌─────────────────────────────────────────────────────────────┐ │ slack-bolt Python │ │ bot.py ──► router.py ──► handler.py │ │ │ │ │ ┌───────────────┤ │ │ │ │ │ │ ▼ ▼ │ │ Mem0 Client LangChain │ │ managed ChatOpenAI │ └────────────────────────────────────────────────────────────-┘ │ ▼ ┌───────────────────────┐ │ Mem0 Managed Cloud │ │ Vector Embeddings │ │ Entity Extraction │ │ Deduplication │ └───────────────────────┘ The key design decision: Mem0 is the only stateful dependency . There's no database to manage, no Redis, no Qdrant. The bot process itself is stateless — you can restart it freely without losing any memory. slacktag-oss/ ├── main.py ├── config/settings.py ← Pydantic settings from .env ├── core/ │ ├── bot.py ← Slack Bolt app + event registration │ ├── handler.py ← All orchestration logic lives here │ └── router.py ← Dispatches channel mentions vs DMs ├── memory/ │ ├── base.py ← Abstract interface │ ├── channel memory.py ← Channel + thread scoped memory │ ├── dm memory.py ← Per-user private memory │ └── mem0 store.py ← Mem0 client factory ├── llm/client.py ← ChatOpenAI factory └── tools/registry.py ← Tool plugin stub v2 The typical approach to bot memory is a rolling window: keep the last N messages in the prompt. This breaks down fast — context gets stale, important things fall out of the window, and token costs grow linearly. Mem0 takes a different approach. When you store a conversation, it: When you later ask a question, you get back the most relevant past memories — not just the most recent ones. A user's preference mentioned three weeks ago will surface when relevant, even if hundreds of messages happened in between. Because we're using Mem0's managed cloud, the entire backend is three lines: python memory/mem0 store.py from mem0 import MemoryClient from config.settings import settings def get mem0 client - MemoryClient: return MemoryClient api key=settings.MEM0 API KEY No vector database config. No embedding model to choose. No collection names to manage. The key insight for a Slack bot is that different conversations need different memory boundaries: python channel memory.py def scope id self, channel id: str, thread ts: str = None - str: if thread ts: return f"thread:{channel id}:{thread ts}" isolated thread return f"channel:{channel id}" shared channel dm memory.py def scope id self, user id: str - str: return f"dm:{user id}" private per user Mem0 uses this string as a user id — anything stored under channel:C12345 is shared by everyone in that channel. Anything under dm:U67890 is private. Thread memory is completely isolated so a debugging session in a thread doesn't pollute the main channel's memory. Both ChannelMemory and DMMemory implement the same four-method interface: python memory/base.py class BaseMemory ABC : @abstractmethod def add self, messages: list dict , scope id: str - None: ... @abstractmethod def search self, query: str, scope id: str - list dict : ... @abstractmethod def get all self, scope id: str - list dict : ... @abstractmethod def clear self, scope id: str - None: ... This makes it easy to swap backends later — implement BaseMemory , update the factory, done. python llm/client.py from langchain openai import ChatOpenAI from config.settings import settings def get llm - ChatOpenAI: return ChatOpenAI base url=settings.LLM BASE URL, api key=settings.LLM API KEY, model=settings.LLM MODEL, temperature=0.7, streaming=True, base url is the only thing that changes between providers. Ollama, LM Studio, OpenAI, Groq, Together AI — all work without touching any other code. handler.py is the heart of the bot. For every request, it: python core/handler.py simplified def handle channel mention channel id, user id, text, thread ts=None : scope = channel memory.scope id channel id, thread ts Built-in commands short-circuit before hitting the LLM if text.strip == " clear": channel memory.clear scope return "Memory cleared." if text.strip == " memory": return format memories channel memory.get all scope Dual retrieval: semantic + recency relevant = channel memory.search text, scope history = channel memory.get all scope messages = build messages system prompt, relevant, history, text response = llm.invoke messages reply = response.content Store the exchange — Mem0 extracts entities + deduplicates channel memory.add {"role": "user", "content": text}, {"role": "assistant", "content": reply} , scope, return reply The message list passed to the LLM is assembled in a specific order: python def build messages system prompt, relevant memories, recent history, user input : messages = SystemMessage content=system prompt Inject relevant memories as a second system message if relevant memories: memory context = "\n".join m "memory" for m in relevant memories if "memory" in m messages.append SystemMessage content=f"Relevant context from earlier:\n{memory context}" Append recent history for entry in recent history -MAX HISTORY MESSAGES: : if entry.get "role" == "user": messages.append HumanMessage content=entry "content" elif entry.get "role" == "assistant": messages.append AIMessage content=entry "content" Current user message always last messages.append HumanMessage content=user input return messages The two-system-message pattern keeps the bot's persona and instructions separate from the injected memory context — cleaner for the model to reason about. slack-bolt makes event handling clean: core/bot.py app = App token=settings.SLACK BOT TOKEN, signing secret=settings.SLACK SIGNING SECRET @app.event "app mention" def on mention event, say : route mention event, say channel / thread flow @app.event "message" def on message event, say : if event.get "channel type" == "im" and not event.get "bot id" : route dm event, say DM flow, ignore bot's own messages router.py extracts the relevant fields and calls the appropriate handler: python core/router.py def route mention event, say : channel id = event.get "channel" thread ts = event.get "thread ts" text = event.get "text", "" reply = handle channel mention channel id, event "user" , text, thread ts say text=reply, thread ts=thread ts or event "ts" Replies always go back to the same thread — if the mention was in a thread, the bot stays in that thread. All config lives in one place with validation: config/settings.py class Settings BaseSettings : SLACK BOT TOKEN: str SLACK APP TOKEN: str SLACK SIGNING SECRET: str LLM BASE URL: str = "http://localhost:11434/v1" LLM API KEY: str = "ollama" LLM MODEL: str = "llama3.2" MEM0 API KEY: str BOT NAME: str = "Claude" MAX HISTORY MESSAGES: int = 20 SYSTEM PROMPT: str = "" class Config: env file = ".env" Missing required fields the Slack tokens, the Mem0 key raise a ValidationError at startup — fail fast before any event processing begins. Get dependencies pip install -r requirements.txt Start the bot Socket Mode — no public URL needed python main.py That's it. No Docker, no Qdrant, no ngrok. Invite the bot to a channel, @mention it, and it starts building memory from the first message. Here's a realistic example. Day 1: User:@slacktag Our API rate limit is 100 req/min per tenant. Keep that in mind for capacity planning. Bot:Got it. I'll factor that in for any capacity discussions. Day 3 hundreds of messages later in the channel : User:@slacktag We're about to onboard 5 new enterprise tenants. Any concerns? Bot:A few things to consider: with your current API rate limit of 100 req/min per tenant, 5 new enterprise tenants could significantly increase peak load. You may want to review your rate limiting strategy before onboarding... Mem0 surfaced the rate limit fact from Day 1 because it was semantically relevant to the capacity question — even though it was nowhere in the recent message window. For production, swap SocketModeHandler for a standard HTTP adapter: python Using Flask from slack bolt.adapter.flask import SlackRequestHandler from flask import Flask, request flask app = Flask name handler = SlackRequestHandler app @flask app.route "/slack/events", methods= "POST" def events : return handler.handle request Point your Slack app's Request URL to https://your-domain/slack/events , deploy anywhere Fly.io, Railway, Cloud Run — all work , and you're done. No state in the server — Mem0 holds everything. A few extensions that would make this significantly more powerful: Pluggable tools — tools/registry.py is stubbed out for LangChain tool integration. Adding web search Tavily, Brave Search or a code execution sandbox would turn this into a capable agent. Mem0 graph memory — Mem0 supports a graph mode that tracks relationships between entities across conversations. You could map out who's on which team, what projects are in flight, and surface that context automatically. Per-channel LLM config — let admins set a different model per channel e.g., a powerful model for architecture, a fast cheap model for random . Reaction triggers — react with 🧠 to explicitly add a message to memory; react with 🗑️ to remove a fact. Much more controllable than pure auto-extraction. summarize — call mem0.get all and ask the LLM to produce a readable summary of everything it knows about this channel.The codebase is intentionally small. handler.py is ~100 lines. Every module does one thing. If you want to contribute: git clone https://github.com/harishkotra/slacktag-oss cd slacktag-oss python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt cp .env.example .env Pick any feature from the table in the README, implement it, and open a PR. The architecture is designed to stay simple — add without entangling.