# Building a Hermes Memory Plugin for a Voice-Powered Conference Agent with Weaviate Engram🧠

> Source: <https://dev.to/astrodevil/building-a-hermes-memory-plugin-for-a-voice-powered-conference-agent-with-weaviate-engram-39jj>
> Published: 2026-06-17 16:44:00+00:00

Recently, I have been attending a lot of conferences where different booths showcase different projects. One thing I kept noticing was how often visitors approached a booth only to find nobody there to answer their questions.

After seeing this happen several times, I started wondering: what if every booth had an AI agent capable of answering visitor questions and keeping track of interactions for the booth owners?

So I decided to build one using [Hermes](https://hermes-agent.nousresearch.com/).

But I quickly ran into a problem: **memory**.

Hermes’ default memory system was designed for smaller, single user interactions. I needed something that could retain information across many different visitors and conversations.

There are multiple third-party memory plugins for Hermes, but when I came across [Engram](https://weaviate.io/blog/engram-generally-available), Weaviate’s memory solution for AI agents. It looked like exactly what I needed, giving me the opportunity to both enhance my agent’s memory and put Engram to the test.

In this article, I will walk you through how I built a voice-enabled conference agent on top of Hermes with voice support, and how I extended its memory by building a memory plugin using Engram.

So here’s what I set out to build.

The conference agent would sit at a booth on a laptop while visitors walk up and interact with it. The agent would answer questions about the project, try to get to know the visitors, and keep track of conversations. Later, the booth owners could come back and ask the agent questions about what particular visitors were interested in.

The conference agent needs a couple of things:

For the AI agent, Hermes already had me covered. For the attendee interface, the [Hermes CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli) was perfect since it already has real time [voice mode](https://hermes-agent.nousresearch.com/docs/user-guide/features/voice-mode).

Then for the booth owners, the Hermes [messaging gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging/) makes it possible to communicate with the agent over Telegram and retrieve insights from conversations.

So Hermes already had almost everything covered except for one thing: **memory**.

Hermes’ built in memory system was designed around single user interactions. It stores memory in two markdown files: `Memory.md`

for general facts and `User.md`

for user specific information.

The problem is that this memory is very limited. `Memory.md`

can only hold about **2,200 characters** (roughly 800 tokens), while `User.md`

holds around **1,375 characters** (about 500 tokens).

That simply isn’t enough for a conference booth interacting with potentially hundreds of visitors.

That’s where [Engram](https://weaviate.io/blog/engram-deep-dive) comes in.

Thanks to Hermes [plugin system](https://hermes-agent.nousresearch.com/docs/user-guide/features/plugins), I could extend Hermes built in memory and add Engram as a [custom memory provider](https://hermes-agent.nousresearch.com/docs/developer-guide/memory-provider-plugin#adding-cli-commands).

Let’s briefly explore Engram.

[Engram](https://weaviate.io/blog/engram-deep-dive) is Weaviate’s managed memory and context service, purpose-built to help AI agents orchestrate workflows, learn from experience, and anchor decisions to trusted knowledge.

When a user interacts with an agent, Engram extracts useful information from the conversation and stores it as memory. These extracted memories are then committed to Engram for later use.

One thing that makes Engram stand out is that it doesn’t just keep adding new memory blindly. It can also retrieve existing memories and update them with new information. This helps avoid duplicates and keeps memory more accurate over time.

That’s one of the main reasons Engram was a good fit for my project. I didn’t want a system that just keeps piling up redundant information.

Once memories are stored, they can be searched later and used by an AI agent or even in other pipelines like RAG systems.

Engram is [generally available](https://weaviate.io/blog/engram-generally-available) to everyone. To get started, head over to [Weaviate Console](https://console.weaviate.cloud/) and sign up for the free tier. Once you're in, navigate to the Engram dashboard and create an API key.

Then, install the Python SDK:

```
pip install weaviate-engram
```

Then create a client:

``` python
from engram import EngramClient

client = EngramClient(api_key="your-api-key")
```

There are two main ways to add memory to Engram: using **strings** or **conversations**.

With strings, you can pass a single piece of text and Engram will extract useful information from it:

```
run = client.memories.add("Alice prefers async Python and avoids Java.", 
user_id="hermes"
)
```

Then for a conversation with an AI assistant:

```
run = client.memories.add(
   [
       {"role": "user", "content": "What's the best way to handle retries?"},
       {"role": "assistant", "content": "Exponential backoff with jitter is the standard approach."},
       {"role": "user", "content": "Got it — I'll use that in my HTTP client."},
   ],
   user_id="hermes",
)
```

The `user_id`

is used to scope memory to a specific user, so you can easily separate and retrieve memories per person.

You can then search stored memories like this:

```
results = client.memories.search(query="What does Alice think about Python?", user_id="hermes")
for memory in results: print(memory.content)
```

This lets you retrieve only the most relevant memories for a given user.

Now that we understand how Engram works, let’s build the plugin.

In Hermes, every memory plugin inherits from the `MemoryProvider`

class. This is an abstract base class, which means we need to implement the methods we want to use.

A memory plugin consists of two main files:

`__init__.py`

: This contains the actual plugin implementation`plugin.yaml`

: This defines the plugin metadataHere’s how we are going to implement the plugin:

After each session with a user, the plugin will store the full conversation in Engram. Engram will then extract the relevant memories from it automatically.

We’ll also expose a tool that allows the agent to search Engram and retrieve relevant memories when needed.

Let’s start implementing it. Below is the basic plugin structure without the logic implemented yet.

This code will live in `__init__.py`

.

``` python
import json
import os
from typing import Any, Dict, List

from agent.memory_provider import MemoryProvider
from engram import EngramClient

class Engram(MemoryProvider):

   @property
   def name(self) -> str:
       return "engram"

   def is_available(self) -> bool:
       return bool(os.environ.get("ENGRAM_API_KEY"))

   def initialize(self, session_id: str, **kwargs) -> None:
       pass

   def get_config_schema(self):
       pass

   def on_memory_write(
       self,
       action: str,
       target: str,
       content: str,
   ) -> None:
       pass

   def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
       pass

   def get_tool_schemas(self) -> List[Dict[str, Any]]:
       pass

   def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
       pass
```

Here are methods that will be implemented:

Let’s start by implementing the `initialize`

and `get_config_schema`

methods.

``` php
  def initialize(self, session_id: str, **kwargs) -> None:
       self._client = EngramClient(api_key=os.environ["ENGRAM_API_KEY"])
       self._user_id = session_id
```

This method is called when the plugin is loaded. Here we initialize the Engram client using the API key stored in the environment variables.

We also store the `session_id`

. This will be used as the `user_id`

when storing and searching memories.

Next is the configuration schema:

``` python
   def get_config_schema(self):
       return [
           {
               "key": "api_key",
               "description": "Engram API key",
               "secret": True,
               "required": True,
               "env_var": "ENGRAM_API_KEY",
               "url": "https://console.weaviate.cloud/engram",
           }
       ]
```

In get_config_schema, we define the configuration needed by the plugin. In this case, the plugin requires an `ENGRAM_API_KEY`

.

The `on_memory_write`

and `on_session_end`

methods are hooks connected to Hermes’ event system.

Whenever Hermes writes memory to its Markdown files, it triggers the `on_memory_write`

hook. In this method, we send that memory directly to Engram.

``` python
   def on_memory_write(
       self,
       action: str,
       target: str,
       content: str,
   ) -> None:
       self._client.memories.add(content, user_id=self._user_id)
```

One important thing to note is that Hermes still uses its built in Markdown memory system alongside an external provider. We can use this to keep Engram continuously updated with the memories Hermes writes locally.

The `on_session_end`

hook is triggered when a conversation session ends. Here, we store the entire conversation between the user and the agent.

``` php
   def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
       parsed_message = []
       for message in messages:
           if message['role'] == 'user':
               parsed_message.append({'role': 'user', 'content': message['content'] })

           if message['role'] == 'assistant':
               parsed_message.append({'role': 'assistant', 'content': message['content'] })

       self._client.memories.add(parsed_message, user_id=self._user_id)
```

Both hooks use the session ID as the `user_id`

. I decided to do it this way so that every visitor interacting with the booth agent gets their own dedicated memory scope. This keeps memories grouped per visitor instead of mixing conversations together.

For the agent to search memories stored in Engram, we first need to define a tool schema.

```
SEARCH_SCHEMA = {
    "name": "engram_search",
    "description": (
        "Search memories in engram"
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "What to search for in engram's memory.",
            },
            "user_id": {
                "type": "string",
                "description": "The user ID to search memories for.",
            },
        },
        "required": ["query", "user_id"],
    },
}
```

Next, we expose the tool schema through `get_tool_schemas`

:

``` php
   def get_tool_schemas(self) -> List[Dict[str, Any]]:
       return [SEARCH_SCHEMA]
```

Finally, we implement `handle_tool_call`

, which runs whenever the agent calls the tool.

``` python
    def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:

        if tool_name == "engram_search":
            query = args["query"]
            user_id = args["user_id"]
            results = self._client.memories.search(query=query, user_id=user_id)
            text = []
            for result in results:
                text.append(result.content)

            return json.dumps({"result": "\n".join(text)})

        return json.dumps({"error": f"Unknown tool {tool_name}"})
```

After implementing the Engram memory provider, we can register it as a memory plugin at the bottom of the file:

``` php
def register(ctx) -> None:
   """Called by the memory plugin discovery system."""
   ctx.register_memory_provider(Engram())
```

This allows Hermes to discover and load our plugin automatically.

Here’s the complete **` init**.py` file:

``` python
import json
from typing import Any, Dict, List

from agent.memory_provider import MemoryProvider
from engram import EngramClient

import os

SEARCH_SCHEMA = {
    "name": "engram_search",
    "description": (
        "Search memories in engram"
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "What to search for in engram's memory.",
            },
            "user_id": {
                "type": "string",
                "description": "The user ID to search memories for.",
            },
        },
        "required": ["query", "user_id"],
    },
}

class Engram(MemoryProvider):

    @property
    def name(self) -> str:
        return "engram"

    def is_available(self) -> bool:
        return bool(os.environ.get("ENGRAM_API_KEY"))

    def initialize(self, session_id: str, **kwargs) -> None:
        self._client = EngramClient(api_key=os.environ["ENGRAM_API_KEY"])
        self._user_id = session_id

    def get_config_schema(self):
        return [
            {
                "key": "api_key",
                "description": "Engram API key",
                "secret": True,
                "required": True,
                "env_var": "ENGRAM_API_KEY",
                "url": "https://console.weaviate.cloud/engram",
            }
        ]

    def on_memory_write(
        self,
        action: str,
        target: str,
        content: str,
    ) -> None:
        self._client.memories.add(content, user_id=self._user_id)

    def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
        parsed_message = []
        for message in messages:
            if message['role'] == 'user':
                parsed_message.append({'role': 'user', 'content': message['content'] })

            if message['role'] == 'assistant':
                parsed_message.append({'role': 'assistant', 'content': message['content'] })

        self._client.memories.add(parsed_message, user_id=self._user_id)

    def get_tool_schemas(self) -> List[Dict[str, Any]]:
        return [SEARCH_SCHEMA]

    def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:

        if tool_name == "engram_search":
            query = args["query"]
            user_id = args["user_id"]
            results = self._client.memories.search(query=query, user_id=user_id)
            text = []
            for result in results:
                text.append(result.content)

            return json.dumps({"result": "\n".join(text)})

        return json.dumps({"error": f"Unknown tool {tool_name}"})

def register(ctx) -> None:
    """Called by the memory plugin discovery system."""
    ctx.register_memory_provider(Engram())
```

Next, we can create the `plugin.yaml`

file. This file stores the plugin metadata and defines the hooks used by the plugin.

```
name: engram
version: 1.0.0
description: "Engram is a fully managed memory service by Weaviate. It lets you add persistent, personalized memory to AI assistants and agents."
pip_dependencies:
 - weaviate-engram
hooks:
 - on_session_end
 - on_memory_write
```

Now that the memory plugin is implemented, let’s set it up in Hermes. First, place both the `__init__.py`

and `plugin.yaml`

files inside a folder called `engram`

.

Your directory should look like this:

```
engram/
├── __init__.py
└── plugin.yaml
```

Next, move the `engram`

directory into the Hermes plugins directory:

```
mv engram ~/.hermes/plugins/
```

Hermes automatically discovers plugins from this location.

Now enable the plugin by running:

```
hermes plugins enable engram
```

Next, run:

```
hermes memory
```

This lets you confirm that `engram`

now appears as one of the available memory providers.

After that, run the setup command:

```
hermes memory setup
```

You’ll be prompted with a list of available memory providers. Select engram and provide your Engram API key when asked.

Once setup is complete, run:

```
hermes memory
```

Now, it will show that Engram is now configured as an active memory provider.

Now that the Engram plugin is working, let’s test it out.

Before launching Hermes, we first need to modify the `SOUL.md`

file located at `~/.hermes/SOUL.md`

. This file defines Hermes’ personality and behavior.

For this demo, I want Hermes to behave like an AI agent stationed at a Weaviate booth, showcasing Engram at a conference. I also want it to treat every conversation as a completely new interaction.

Here’s the modified `SOUL.md`

file:

```
# Hermes Agent Persona

You are Hermes, an AI agent representing Weaviate at a conference booth. Your role is to help attendees learn about Weaviate and answer questions about its products, especially Engram, Weaviate’s memory product.

Treat every conversation as if you are speaking to a new attendee for the first time. Be warm, friendly, and approachable.
Start by introducing yourself, asking for the person’s name, and asking what they would like to know about Engram or Weaviate.

Your goal is to clearly explain concepts, answer questions accurately, and help people understand how Engram can be used in real world AI applications. Keep your responses conversational, engaging, and easy to understand regardless of the attendee’s technical background.
```

With that in place, we can launch the Hermes CLI and start chatting with the agent.

I took on the persona of “Paul” and had a conversation with Hermes. After the interaction, I closed the session using `Ctrl + C`

.

When I opened the Engram dashboard, I could see that memories from the conversation had been successfully stored.

I could also browse memories from other sessions and confirm that each visitor’s interactions were being stored separately.

This means Hermes can now retain information across multiple users and conversations instead of relying only on short term Markdown memory.

With Engram memory now integrated into Hermes, the conference agent is almost ready. The next thing we need to set up is voice support.

There are different approaches to [handling voice](https://hermes-agent.nousresearch.com/docs/user-guide/features/voice-mode) in Hermes. You could use cloud models for both speech to text and text to speech, or you could run everything locally.

I decided to go with local models. Here was my setup:

First, I installed the required Python dependencies:

```
pip install "hermes-agent[voice]"
pip install -U neutts[all]
pip install sounddevice numpy
```

NeuTTS provides local text to speech capabilities.

Next, I installed the system dependencies required for audio processing:

```
sudo apt install portaudio19-dev ffmpeg libopus0
sudo apt install espeak-ng   # required for NeuTTS
```

Once everything is installed, Hermes can be launched in the terminal and voice mode enabled with:

```
 /voice on
```

To enable text to speech:

```
 /voice tts
```

The last feature to add is the [messaging gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging/), which allows booth owners to message the agent and retrieve insights from ongoing conversations with attendees.

This makes it possible to ask questions, monitor interactions, and extract information even when they are not physically at the booth, without interrupting attendees interacting with the agent.

Hermes supports [multiple messaging platforms](https://hermes-agent.nousresearch.com/docs/user-guide/messaging/#platform-comparison). In this case, I used telegram as the primary interface for interacting with the agent.

Follow the official Hermes guide to [set up Telegram](https://hermes-agent.nousresearch.com/docs/user-guide/messaging/telegram) as your messaging platform, or choose another supported option. Once configured, [enable the gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging/telegram#start-the-gateway) and you can start interacting with the agent remotely.

The image above shows how I used the `engram_search`

tool to retrieve memory for a specific user using their session ID. Since session IDs are only accessible to authorized admins via the Engram dashboard, this keeps the data private while still allowing useful insights for booth owners.

With the conference agent is completed let’s do a recap on what we have built:

While this agentic setup was built with conference booths in mind, we could take the memory plugin we built and apply it to the following use cases:

**You can get the full code in this article here**: [Hermes_Engram_Plugin](https://github.com/Studio1HQ/Hermes_Engram_Plugin)

What started as a simple idea for a better conference booth experience turned into a full AI agent with persistent memory, real time interaction, and voice capabilities.

With [Hermes](https://github.com/nousresearch/hermes-agent) handling conversations, [Engram](https://weaviate.io/blog/engram-generally-available) managing long term memory, and voice support making interactions feel natural, the agent is no longer just answering questions, it’s actively remembering people, conversations, and context across sessions.

While this solution was built with a conference setting in mind, the memory plugin we developed can be used far beyond that. From personal AI assistants to more complex multi user systems, the same approach can help any agent retain meaningful context over time and deliver more useful, personalized interactions.
