Building a Hermes Memory Plugin for a Voice-Powered Conference Agent with Weaviate Engram🧠

A developer built a voice-powered conference agent using Hermes and extended its memory with Weaviate's Engram service. The agent answers visitor questions at booths and retains conversation history, allowing booth owners to later query visitor interests. Engram's memory system avoids duplicates and updates existing memories, solving Hermes' default memory limitations.

Recently, I have been attending a lot of conferences where different booths showcase different projects. One thing I kept noticing was how often visitors approached a booth only to find nobody there to answer their questions. After seeing this happen several times, I started wondering: what if every booth had an AI agent capable of answering visitor questions and keeping track of interactions for the booth owners? So I decided to build one using Hermes https://hermes-agent.nousresearch.com/ . But I quickly ran into a problem: memory . Hermes’ default memory system was designed for smaller, single user interactions. I needed something that could retain information across many different visitors and conversations. There are multiple third-party memory plugins for Hermes, but when I came across Engram https://weaviate.io/blog/engram-generally-available , Weaviate’s memory solution for AI agents. It looked like exactly what I needed, giving me the opportunity to both enhance my agent’s memory and put Engram to the test. In this article, I will walk you through how I built a voice-enabled conference agent on top of Hermes with voice support, and how I extended its memory by building a memory plugin using Engram. So here’s what I set out to build. The conference agent would sit at a booth on a laptop while visitors walk up and interact with it. The agent would answer questions about the project, try to get to know the visitors, and keep track of conversations. Later, the booth owners could come back and ask the agent questions about what particular visitors were interested in. The conference agent needs a couple of things: For the AI agent, Hermes already had me covered. For the attendee interface, the Hermes CLI https://hermes-agent.nousresearch.com/docs/user-guide/cli was perfect since it already has real time voice mode https://hermes-agent.nousresearch.com/docs/user-guide/features/voice-mode . Then for the booth owners, the Hermes messaging gateway https://hermes-agent.nousresearch.com/docs/user-guide/messaging/ makes it possible to communicate with the agent over Telegram and retrieve insights from conversations. So Hermes already had almost everything covered except for one thing: memory . Hermes’ built in memory system was designed around single user interactions. It stores memory in two markdown files: Memory.md for general facts and User.md for user specific information. The problem is that this memory is very limited. Memory.md can only hold about 2,200 characters roughly 800 tokens , while User.md holds around 1,375 characters about 500 tokens . That simply isn’t enough for a conference booth interacting with potentially hundreds of visitors. That’s where Engram https://weaviate.io/blog/engram-deep-dive comes in. Thanks to Hermes plugin system https://hermes-agent.nousresearch.com/docs/user-guide/features/plugins , I could extend Hermes built in memory and add Engram as a custom memory provider https://hermes-agent.nousresearch.com/docs/developer-guide/memory-provider-plugin adding-cli-commands . Let’s briefly explore Engram. Engram https://weaviate.io/blog/engram-deep-dive is Weaviate’s managed memory and context service, purpose-built to help AI agents orchestrate workflows, learn from experience, and anchor decisions to trusted knowledge. When a user interacts with an agent, Engram extracts useful information from the conversation and stores it as memory. These extracted memories are then committed to Engram for later use. One thing that makes Engram stand out is that it doesn’t just keep adding new memory blindly. It can also retrieve existing memories and update them with new information. This helps avoid duplicates and keeps memory more accurate over time. That’s one of the main reasons Engram was a good fit for my project. I didn’t want a system that just keeps piling up redundant information. Once memories are stored, they can be searched later and used by an AI agent or even in other pipelines like RAG systems. Engram is generally available https://weaviate.io/blog/engram-generally-available to everyone. To get started, head over to Weaviate Console https://console.weaviate.cloud/ and sign up for the free tier. Once you're in, navigate to the Engram dashboard and create an API key. Then, install the Python SDK: pip install weaviate-engram Then create a client: python from engram import EngramClient client = EngramClient api key="your-api-key" There are two main ways to add memory to Engram: using strings or conversations . With strings, you can pass a single piece of text and Engram will extract useful information from it: run = client.memories.add "Alice prefers async Python and avoids Java.", user id="hermes" Then for a conversation with an AI assistant: run = client.memories.add {"role": "user", "content": "What's the best way to handle retries?"}, {"role": "assistant", "content": "Exponential backoff with jitter is the standard approach."}, {"role": "user", "content": "Got it — I'll use that in my HTTP client."}, , user id="hermes", The user id is used to scope memory to a specific user, so you can easily separate and retrieve memories per person. You can then search stored memories like this: results = client.memories.search query="What does Alice think about Python?", user id="hermes" for memory in results: print memory.content This lets you retrieve only the most relevant memories for a given user. Now that we understand how Engram works, let’s build the plugin. In Hermes, every memory plugin inherits from the MemoryProvider class. This is an abstract base class, which means we need to implement the methods we want to use. A memory plugin consists of two main files: init .py : This contains the actual plugin implementation plugin.yaml : This defines the plugin metadataHere’s how we are going to implement the plugin: After each session with a user, the plugin will store the full conversation in Engram. Engram will then extract the relevant memories from it automatically. We’ll also expose a tool that allows the agent to search Engram and retrieve relevant memories when needed. Let’s start implementing it. Below is the basic plugin structure without the logic implemented yet. This code will live in init .py . python import json import os from typing import Any, Dict, List from agent.memory provider import MemoryProvider from engram import EngramClient class Engram MemoryProvider : @property def name self - str: return "engram" def is available self - bool: return bool os.environ.get "ENGRAM API KEY" def initialize self, session id: str, kwargs - None: pass def get config schema self : pass def on memory write self, action: str, target: str, content: str, - None: pass def on session end self, messages: List Dict str, Any - None: pass def get tool schemas self - List Dict str, Any : pass def handle tool call self, tool name: str, args: Dict str, Any , kwargs - str: pass Here are methods that will be implemented: Let’s start by implementing the initialize and get config schema methods. php def initialize self, session id: str, kwargs - None: self. client = EngramClient api key=os.environ "ENGRAM API KEY" self. user id = session id This method is called when the plugin is loaded. Here we initialize the Engram client using the API key stored in the environment variables. We also store the session id . This will be used as the user id when storing and searching memories. Next is the configuration schema: python def get config schema self : return { "key": "api key", "description": "Engram API key", "secret": True, "required": True, "env var": "ENGRAM API KEY", "url": "https://console.weaviate.cloud/engram", } In get config schema, we define the configuration needed by the plugin. In this case, the plugin requires an ENGRAM API KEY . The on memory write and on session end methods are hooks connected to Hermes’ event system. Whenever Hermes writes memory to its Markdown files, it triggers the on memory write hook. In this method, we send that memory directly to Engram. python def on memory write self, action: str, target: str, content: str, - None: self. client.memories.add content, user id=self. user id One important thing to note is that Hermes still uses its built in Markdown memory system alongside an external provider. We can use this to keep Engram continuously updated with the memories Hermes writes locally. The on session end hook is triggered when a conversation session ends. Here, we store the entire conversation between the user and the agent. php def on session end self, messages: List Dict str, Any - None: parsed message = for message in messages: if message 'role' == 'user': parsed message.append {'role': 'user', 'content': message 'content' } if message 'role' == 'assistant': parsed message.append {'role': 'assistant', 'content': message 'content' } self. client.memories.add parsed message, user id=self. user id Both hooks use the session ID as the user id . I decided to do it this way so that every visitor interacting with the booth agent gets their own dedicated memory scope. This keeps memories grouped per visitor instead of mixing conversations together. For the agent to search memories stored in Engram, we first need to define a tool schema. SEARCH SCHEMA = { "name": "engram search", "description": "Search memories in engram" , "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "What to search for in engram's memory.", }, "user id": { "type": "string", "description": "The user ID to search memories for.", }, }, "required": "query", "user id" , }, } Next, we expose the tool schema through get tool schemas : php def get tool schemas self - List Dict str, Any : return SEARCH SCHEMA Finally, we implement handle tool call , which runs whenever the agent calls the tool. python def handle tool call self, tool name: str, args: Dict str, Any , kwargs - str: if tool name == "engram search": query = args "query" user id = args "user id" results = self. client.memories.search query=query, user id=user id text = for result in results: text.append result.content return json.dumps {"result": "\n".join text } return json.dumps {"error": f"Unknown tool {tool name}"} After implementing the Engram memory provider, we can register it as a memory plugin at the bottom of the file: php def register ctx - None: """Called by the memory plugin discovery system.""" ctx.register memory provider Engram This allows Hermes to discover and load our plugin automatically. Here’s the complete init .py file: python import json from typing import Any, Dict, List from agent.memory provider import MemoryProvider from engram import EngramClient import os SEARCH SCHEMA = { "name": "engram search", "description": "Search memories in engram" , "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "What to search for in engram's memory.", }, "user id": { "type": "string", "description": "The user ID to search memories for.", }, }, "required": "query", "user id" , }, } class Engram MemoryProvider : @property def name self - str: return "engram" def is available self - bool: return bool os.environ.get "ENGRAM API KEY" def initialize self, session id: str, kwargs - None: self. client = EngramClient api key=os.environ "ENGRAM API KEY" self. user id = session id def get config schema self : return { "key": "api key", "description": "Engram API key", "secret": True, "required": True, "env var": "ENGRAM API KEY", "url": "https://console.weaviate.cloud/engram", } def on memory write self, action: str, target: str, content: str, - None: self. client.memories.add content, user id=self. user id def on session end self, messages: List Dict str, Any - None: parsed message = for message in messages: if message 'role' == 'user': parsed message.append {'role': 'user', 'content': message 'content' } if message 'role' == 'assistant': parsed message.append {'role': 'assistant', 'content': message 'content' } self. client.memories.add parsed message, user id=self. user id def get tool schemas self - List Dict str, Any : return SEARCH SCHEMA def handle tool call self, tool name: str, args: Dict str, Any , kwargs - str: if tool name == "engram search": query = args "query" user id = args "user id" results = self. client.memories.search query=query, user id=user id text = for result in results: text.append result.content return json.dumps {"result": "\n".join text } return json.dumps {"error": f"Unknown tool {tool name}"} def register ctx - None: """Called by the memory plugin discovery system.""" ctx.register memory provider Engram Next, we can create the plugin.yaml file. This file stores the plugin metadata and defines the hooks used by the plugin. name: engram version: 1.0.0 description: "Engram is a fully managed memory service by Weaviate. It lets you add persistent, personalized memory to AI assistants and agents." pip dependencies: - weaviate-engram hooks: - on session end - on memory write Now that the memory plugin is implemented, let’s set it up in Hermes. First, place both the init .py and plugin.yaml files inside a folder called engram . Your directory should look like this: engram/ ├── init .py └── plugin.yaml Next, move the engram directory into the Hermes plugins directory: mv engram ~/.hermes/plugins/ Hermes automatically discovers plugins from this location. Now enable the plugin by running: hermes plugins enable engram Next, run: hermes memory This lets you confirm that engram now appears as one of the available memory providers. After that, run the setup command: hermes memory setup You’ll be prompted with a list of available memory providers. Select engram and provide your Engram API key when asked. Once setup is complete, run: hermes memory Now, it will show that Engram is now configured as an active memory provider. Now that the Engram plugin is working, let’s test it out. Before launching Hermes, we first need to modify the SOUL.md file located at ~/.hermes/SOUL.md . This file defines Hermes’ personality and behavior. For this demo, I want Hermes to behave like an AI agent stationed at a Weaviate booth, showcasing Engram at a conference. I also want it to treat every conversation as a completely new interaction. Here’s the modified SOUL.md file: Hermes Agent Persona You are Hermes, an AI agent representing Weaviate at a conference booth. Your role is to help attendees learn about Weaviate and answer questions about its products, especially Engram, Weaviate’s memory product. Treat every conversation as if you are speaking to a new attendee for the first time. Be warm, friendly, and approachable. Start by introducing yourself, asking for the person’s name, and asking what they would like to know about Engram or Weaviate. Your goal is to clearly explain concepts, answer questions accurately, and help people understand how Engram can be used in real world AI applications. Keep your responses conversational, engaging, and easy to understand regardless of the attendee’s technical background. With that in place, we can launch the Hermes CLI and start chatting with the agent. I took on the persona of “Paul” and had a conversation with Hermes. After the interaction, I closed the session using Ctrl + C . When I opened the Engram dashboard, I could see that memories from the conversation had been successfully stored. I could also browse memories from other sessions and confirm that each visitor’s interactions were being stored separately. This means Hermes can now retain information across multiple users and conversations instead of relying only on short term Markdown memory. With Engram memory now integrated into Hermes, the conference agent is almost ready. The next thing we need to set up is voice support. There are different approaches to handling voice https://hermes-agent.nousresearch.com/docs/user-guide/features/voice-mode in Hermes. You could use cloud models for both speech to text and text to speech, or you could run everything locally. I decided to go with local models. Here was my setup: First, I installed the required Python dependencies: pip install "hermes-agent voice " pip install -U neutts all pip install sounddevice numpy NeuTTS provides local text to speech capabilities. Next, I installed the system dependencies required for audio processing: sudo apt install portaudio19-dev ffmpeg libopus0 sudo apt install espeak-ng required for NeuTTS Once everything is installed, Hermes can be launched in the terminal and voice mode enabled with: /voice on To enable text to speech: /voice tts The last feature to add is the messaging gateway https://hermes-agent.nousresearch.com/docs/user-guide/messaging/ , which allows booth owners to message the agent and retrieve insights from ongoing conversations with attendees. This makes it possible to ask questions, monitor interactions, and extract information even when they are not physically at the booth, without interrupting attendees interacting with the agent. Hermes supports multiple messaging platforms https://hermes-agent.nousresearch.com/docs/user-guide/messaging/ platform-comparison . In this case, I used telegram as the primary interface for interacting with the agent. Follow the official Hermes guide to set up Telegram https://hermes-agent.nousresearch.com/docs/user-guide/messaging/telegram as your messaging platform, or choose another supported option. Once configured, enable the gateway https://hermes-agent.nousresearch.com/docs/user-guide/messaging/telegram start-the-gateway and you can start interacting with the agent remotely. The image above shows how I used the engram search tool to retrieve memory for a specific user using their session ID. Since session IDs are only accessible to authorized admins via the Engram dashboard, this keeps the data private while still allowing useful insights for booth owners. With the conference agent is completed let’s do a recap on what we have built: While this agentic setup was built with conference booths in mind, we could take the memory plugin we built and apply it to the following use cases: You can get the full code in this article here : Hermes Engram Plugin https://github.com/Studio1HQ/Hermes Engram Plugin What started as a simple idea for a better conference booth experience turned into a full AI agent with persistent memory, real time interaction, and voice capabilities. With Hermes https://github.com/nousresearch/hermes-agent handling conversations, Engram https://weaviate.io/blog/engram-generally-available managing long term memory, and voice support making interactions feel natural, the agent is no longer just answering questions, it’s actively remembering people, conversations, and context across sessions. While this solution was built with a conference setting in mind, the memory plugin we developed can be used far beyond that. From personal AI assistants to more complex multi user systems, the same approach can help any agent retain meaningful context over time and deliver more useful, personalized interactions.