How I built an open-source Slack assistant with persistent semantic memory, powered by any LLM and Mem0's managed memory layer β no vector database required.
Most AI Slack bots have the memory of a goldfish. Every conversation starts from scratch. You ask it about your sprint goals, it gives a great answer, then three days later you ask a follow-up and it has no idea what you're talking about. You end up re-explaining context constantly.
The commercial solution to this is Claude Tag β a Slack integration that maintains genuine conversational continuity. But it's tied to one provider and not open-source.
slacktag-oss
is our attempt to replicate that experience: a Slack bot with real, semantic, persistent memory that works with any LLM β including ones running entirely on your laptop.
A Python Slack bot with:
!clear
and !memory
commandsBefore diving into code, here's the full request lifecycle:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Slack β
β @mention in channel βββ β
β DM to bot βββΌβββΊ Slack Events API β
β Thread reply βββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (Socket Mode / HTTP)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β slack-bolt (Python) β
β bot.py βββΊ router.py βββΊ handler.py β
β β β
β βββββββββββββββββ€ β
β β β β
β βΌ βΌ β
β Mem0 Client LangChain β
β (managed) ChatOpenAI β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ-β
β
βΌ
βββββββββββββββββββββββββ
β Mem0 Managed Cloud β
β Vector Embeddings β
β Entity Extraction β
β Deduplication β
βββββββββββββββββββββββββ
The key design decision: Mem0 is the only stateful dependency. There's no database to manage, no Redis, no Qdrant. The bot process itself is stateless β you can restart it freely without losing any memory.
slacktag-oss/
βββ main.py
βββ config/settings.py β Pydantic settings from .env
βββ core/
β βββ bot.py β Slack Bolt app + event registration
β βββ handler.py β All orchestration logic lives here
β βββ router.py β Dispatches channel mentions vs DMs
βββ memory/
β βββ base.py β Abstract interface
β βββ channel_memory.py β Channel + thread scoped memory
β βββ dm_memory.py β Per-user private memory
β βββ mem0_store.py β Mem0 client factory
βββ llm/client.py β ChatOpenAI factory
βββ tools/registry.py β Tool plugin stub (v2)
The typical approach to bot memory is a rolling window: keep the last N messages in the prompt. This breaks down fast β context gets stale, important things fall out of the window, and token costs grow linearly.
Mem0 takes a different approach. When you store a conversation, it:
When you later ask a question, you get back the most relevant past memories β not just the most recent ones. A user's preference mentioned three weeks ago will surface when relevant, even if hundreds of messages happened in between.
Because we're using Mem0's managed cloud, the entire backend is three lines:
from mem0 import MemoryClient
from config.settings import settings
def get_mem0_client() -> MemoryClient:
return MemoryClient(api_key=settings.MEM0_API_KEY)
No vector database config. No embedding model to choose. No collection names to manage.
The key insight for a Slack bot is that different conversations need different memory boundaries:
def scope_id(self, channel_id: str, thread_ts: str = None) -> str:
if thread_ts:
return f"thread:{channel_id}:{thread_ts}" # isolated thread
return f"channel:{channel_id}" # shared channel
def scope_id(self, user_id: str) -> str:
return f"dm:{user_id}" # private per user
Mem0 uses this string as a user_id
β anything stored under channel:C12345
is shared by everyone in that channel. Anything under dm:U67890
is private. Thread memory is completely isolated so a debugging session in a thread doesn't pollute the main channel's memory.
Both ChannelMemory
and DMMemory
implement the same four-method interface:
class BaseMemory(ABC):
@abstractmethod
def add(self, messages: list[dict], scope_id: str) -> None: ...
@abstractmethod
def search(self, query: str, scope_id: str) -> list[dict]: ...
@abstractmethod
def get_all(self, scope_id: str) -> list[dict]: ...
@abstractmethod
def clear(self, scope_id: str) -> None: ...
This makes it easy to swap backends later β implement BaseMemory
, update the factory, done.
from langchain_openai import ChatOpenAI
from config.settings import settings
def get_llm() -> ChatOpenAI:
return ChatOpenAI(
base_url=settings.LLM_BASE_URL,
api_key=settings.LLM_API_KEY,
model=settings.LLM_MODEL,
temperature=0.7,
streaming=True,
)
base_url
is the only thing that changes between providers. Ollama, LM Studio, OpenAI, Groq, Together AI β all work without touching any other code.
handler.py
is the heart of the bot. For every request, it:
def handle_channel_mention(channel_id, user_id, text, thread_ts=None):
scope = channel_memory.scope_id(channel_id, thread_ts)
if text.strip() == "!clear":
channel_memory.clear(scope)
return "Memory cleared."
if text.strip() == "!memory":
return format_memories(channel_memory.get_all(scope))
relevant = channel_memory.search(text, scope)
history = channel_memory.get_all(scope)
messages = build_messages(system_prompt, relevant, history, text)
response = llm.invoke(messages)
reply = response.content
channel_memory.add(
[{"role": "user", "content": text},
{"role": "assistant", "content": reply}],
scope,
)
return reply
The message list passed to the LLM is assembled in a specific order:
def build_messages(system_prompt, relevant_memories, recent_history, user_input):
messages = [SystemMessage(content=system_prompt)]
if relevant_memories:
memory_context = "\n".join(
m["memory"] for m in relevant_memories if "memory" in m
)
messages.append(SystemMessage(
content=f"Relevant context from earlier:\n{memory_context}"
))
for entry in recent_history[-MAX_HISTORY_MESSAGES:]:
if entry.get("role") == "user":
messages.append(HumanMessage(content=entry["content"]))
elif entry.get("role") == "assistant":
messages.append(AIMessage(content=entry["content"]))
messages.append(HumanMessage(content=user_input))
return messages
The two-system-message pattern keeps the bot's persona and instructions separate from the injected memory context β cleaner for the model to reason about.
slack-bolt
makes event handling clean:
app = App(token=settings.SLACK_BOT_TOKEN, signing_secret=settings.SLACK_SIGNING_SECRET)
@app.event("app_mention")
def on_mention(event, say):
route_mention(event, say) # channel / thread flow
@app.event("message")
def on_message(event, say):
if event.get("channel_type") == "im" and not event.get("bot_id"):
route_dm(event, say) # DM flow, ignore bot's own messages
router.py
extracts the relevant fields and calls the appropriate handler:
def route_mention(event, say):
channel_id = event.get("channel")
thread_ts = event.get("thread_ts")
text = event.get("text", "")
reply = handle_channel_mention(channel_id, event["user"], text, thread_ts)
say(text=reply, thread_ts=thread_ts or event["ts"])
Replies always go back to the same thread β if the mention was in a thread, the bot stays in that thread.
All config lives in one place with validation:
class Settings(BaseSettings):
SLACK_BOT_TOKEN: str
SLACK_APP_TOKEN: str
SLACK_SIGNING_SECRET: str
LLM_BASE_URL: str = "http://localhost:11434/v1"
LLM_API_KEY: str = "ollama"
LLM_MODEL: str = "llama3.2"
MEM0_API_KEY: str
BOT_NAME: str = "Claude"
MAX_HISTORY_MESSAGES: int = 20
SYSTEM_PROMPT: str = ""
class Config:
env_file = ".env"
Missing required fields (the Slack tokens, the Mem0 key) raise a ValidationError
at startup β fail fast before any event processing begins.
pip install -r requirements.txt
python main.py
That's it. No Docker, no Qdrant, no ngrok. Invite the bot to a channel, @mention
it, and it starts building memory from the first message.
Here's a realistic example. Day 1:
User:@slacktag Our API rate limit is 100 req/min per tenant. Keep that in mind for capacity planning.
Bot:Got it. I'll factor that in for any capacity discussions.
Day 3 (hundreds of messages later in the channel):
User:@slacktag We're about to onboard 5 new enterprise tenants. Any concerns?
Bot:A few things to consider: with your current API rate limit of 100 req/min per tenant, 5 new enterprise tenants could significantly increase peak load. You may want to review your rate limiting strategy before onboarding...
Mem0 surfaced the rate limit fact from Day 1 because it was semantically relevant to the capacity question β even though it was nowhere in the recent message window.
For production, swap SocketModeHandler
for a standard HTTP adapter:
from slack_bolt.adapter.flask import SlackRequestHandler
from flask import Flask, request
flask_app = Flask(__name__)
handler = SlackRequestHandler(app)
@flask_app.route("/slack/events", methods=["POST"])
def events():
return handler.handle(request)
Point your Slack app's Request URL to https://your-domain/slack/events
, deploy anywhere (Fly.io, Railway, Cloud Run β all work), and you're done. No state in the server β Mem0 holds everything.
A few extensions that would make this significantly more powerful:
Pluggable tools β tools/registry.py
is stubbed out for LangChain tool integration. Adding web search (Tavily, Brave Search) or a code execution sandbox would turn this into a capable agent.
Mem0 graph memory β Mem0 supports a graph mode that tracks relationships between entities across conversations. You could map out who's on which team, what projects are in flight, and surface that context automatically.
Per-channel LLM config β let admins set a different model per channel (e.g., a powerful model for #architecture, a fast cheap model for #random).
Reaction triggers β react with π§ to explicitly add a message to memory; react with ποΈ to remove a fact. Much more controllable than pure auto-extraction.
** !summarize** β call
mem0.get_all()
and ask the LLM to produce a readable summary of everything it knows about this channel.The codebase is intentionally small. handler.py
is ~100 lines. Every module does one thing. If you want to contribute:
git clone https://github.com/harishkotra/slacktag-oss
cd slacktag-oss
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
Pick any feature from the table in the README, implement it, and open a PR. The architecture is designed to stay simple β add without entangling.