Recently, I have been attending a lot of conferences where different booths showcase different projects. One thing I kept noticing was how often visitors approached a booth only to find nobody there to answer their questions.
After seeing this happen several times, I started wondering: what if every booth had an AI agent capable of answering visitor questions and keeping track of interactions for the booth owners?
So I decided to build one using Hermes.
But I quickly ran into a problem: memory.
Hermes’ default memory system was designed for smaller, single user interactions. I needed something that could retain information across many different visitors and conversations.
There are multiple third-party memory plugins for Hermes, but when I came across Engram, Weaviate’s memory solution for AI agents. It looked like exactly what I needed, giving me the opportunity to both enhance my agent’s memory and put Engram to the test.
In this article, I will walk you through how I built a voice-enabled conference agent on top of Hermes with voice support, and how I extended its memory by building a memory plugin using Engram.
So here’s what I set out to build.
The conference agent would sit at a booth on a laptop while visitors walk up and interact with it. The agent would answer questions about the project, try to get to know the visitors, and keep track of conversations. Later, the booth owners could come back and ask the agent questions about what particular visitors were interested in.
The conference agent needs a couple of things:
For the AI agent, Hermes already had me covered. For the attendee interface, the Hermes CLI was perfect since it already has real time voice mode.
Then for the booth owners, the Hermes messaging gateway makes it possible to communicate with the agent over Telegram and retrieve insights from conversations.
So Hermes already had almost everything covered except for one thing: memory.
Hermes’ built in memory system was designed around single user interactions. It stores memory in two markdown files: Memory.md
for general facts and User.md
for user specific information.
The problem is that this memory is very limited. Memory.md
can only hold about 2,200 characters (roughly 800 tokens), while User.md
holds around 1,375 characters (about 500 tokens).
That simply isn’t enough for a conference booth interacting with potentially hundreds of visitors.
That’s where Engram comes in.
Thanks to Hermes plugin system, I could extend Hermes built in memory and add Engram as a custom memory provider.
Let’s briefly explore Engram.
Engram is Weaviate’s managed memory and context service, purpose-built to help AI agents orchestrate workflows, learn from experience, and anchor decisions to trusted knowledge.
When a user interacts with an agent, Engram extracts useful information from the conversation and stores it as memory. These extracted memories are then committed to Engram for later use.
One thing that makes Engram stand out is that it doesn’t just keep adding new memory blindly. It can also retrieve existing memories and update them with new information. This helps avoid duplicates and keeps memory more accurate over time.
That’s one of the main reasons Engram was a good fit for my project. I didn’t want a system that just keeps piling up redundant information.
Once memories are stored, they can be searched later and used by an AI agent or even in other pipelines like RAG systems.
Engram is generally available to everyone. To get started, head over to Weaviate Console and sign up for the free tier. Once you're in, navigate to the Engram dashboard and create an API key.
Then, install the Python SDK:
pip install weaviate-engram
Then create a client:
from engram import EngramClient
client = EngramClient(api_key="your-api-key")
There are two main ways to add memory to Engram: using strings or conversations.
With strings, you can pass a single piece of text and Engram will extract useful information from it:
run = client.memories.add("Alice prefers async Python and avoids Java.",
user_id="hermes"
)
Then for a conversation with an AI assistant:
run = client.memories.add(
[
{"role": "user", "content": "What's the best way to handle retries?"},
{"role": "assistant", "content": "Exponential backoff with jitter is the standard approach."},
{"role": "user", "content": "Got it — I'll use that in my HTTP client."},
],
user_id="hermes",
)
The user_id
is used to scope memory to a specific user, so you can easily separate and retrieve memories per person.
You can then search stored memories like this:
results = client.memories.search(query="What does Alice think about Python?", user_id="hermes")
for memory in results: print(memory.content)
This lets you retrieve only the most relevant memories for a given user.
Now that we understand how Engram works, let’s build the plugin.
In Hermes, every memory plugin inherits from the MemoryProvider
class. This is an abstract base class, which means we need to implement the methods we want to use.
A memory plugin consists of two main files:
__init__.py
: This contains the actual plugin implementationplugin.yaml
: This defines the plugin metadataHere’s how we are going to implement the plugin:
After each session with a user, the plugin will store the full conversation in Engram. Engram will then extract the relevant memories from it automatically.
We’ll also expose a tool that allows the agent to search Engram and retrieve relevant memories when needed.
Let’s start implementing it. Below is the basic plugin structure without the logic implemented yet.
This code will live in __init__.py
.
import json
import os
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
from engram import EngramClient
class Engram(MemoryProvider):
@property
def name(self) -> str:
return "engram"
def is_available(self) -> bool:
return bool(os.environ.get("ENGRAM_API_KEY"))
def initialize(self, session_id: str, **kwargs) -> None:
pass
def get_config_schema(self):
pass
def on_memory_write(
self,
action: str,
target: str,
content: str,
) -> None:
pass
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
pass
def get_tool_schemas(self) -> List[Dict[str, Any]]:
pass
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
pass
Here are methods that will be implemented:
Let’s start by implementing the initialize
and get_config_schema
methods.
def initialize(self, session_id: str, **kwargs) -> None:
self._client = EngramClient(api_key=os.environ["ENGRAM_API_KEY"])
self._user_id = session_id
This method is called when the plugin is loaded. Here we initialize the Engram client using the API key stored in the environment variables.
We also store the session_id
. This will be used as the user_id
when storing and searching memories.
Next is the configuration schema:
def get_config_schema(self):
return [
{
"key": "api_key",
"description": "Engram API key",
"secret": True,
"required": True,
"env_var": "ENGRAM_API_KEY",
"url": "https://console.weaviate.cloud/engram",
}
]
In get_config_schema, we define the configuration needed by the plugin. In this case, the plugin requires an ENGRAM_API_KEY
.
The on_memory_write
and on_session_end
methods are hooks connected to Hermes’ event system.
Whenever Hermes writes memory to its Markdown files, it triggers the on_memory_write
hook. In this method, we send that memory directly to Engram.
def on_memory_write(
self,
action: str,
target: str,
content: str,
) -> None:
self._client.memories.add(content, user_id=self._user_id)
One important thing to note is that Hermes still uses its built in Markdown memory system alongside an external provider. We can use this to keep Engram continuously updated with the memories Hermes writes locally.
The on_session_end
hook is triggered when a conversation session ends. Here, we store the entire conversation between the user and the agent.
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
parsed_message = []
for message in messages:
if message['role'] == 'user':
parsed_message.append({'role': 'user', 'content': message['content'] })
if message['role'] == 'assistant':
parsed_message.append({'role': 'assistant', 'content': message['content'] })
self._client.memories.add(parsed_message, user_id=self._user_id)
Both hooks use the session ID as the user_id
. I decided to do it this way so that every visitor interacting with the booth agent gets their own dedicated memory scope. This keeps memories grouped per visitor instead of mixing conversations together.
For the agent to search memories stored in Engram, we first need to define a tool schema.
SEARCH_SCHEMA = {
"name": "engram_search",
"description": (
"Search memories in engram"
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "What to search for in engram's memory.",
},
"user_id": {
"type": "string",
"description": "The user ID to search memories for.",
},
},
"required": ["query", "user_id"],
},
}
Next, we expose the tool schema through get_tool_schemas
:
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [SEARCH_SCHEMA]
Finally, we implement handle_tool_call
, which runs whenever the agent calls the tool.
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
if tool_name == "engram_search":
query = args["query"]
user_id = args["user_id"]
results = self._client.memories.search(query=query, user_id=user_id)
text = []
for result in results:
text.append(result.content)
return json.dumps({"result": "\n".join(text)})
return json.dumps({"error": f"Unknown tool {tool_name}"})
After implementing the Engram memory provider, we can register it as a memory plugin at the bottom of the file:
def register(ctx) -> None:
"""Called by the memory plugin discovery system."""
ctx.register_memory_provider(Engram())
This allows Hermes to discover and load our plugin automatically.
Here’s the complete ** init**.py file:
import json
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
from engram import EngramClient
import os
SEARCH_SCHEMA = {
"name": "engram_search",
"description": (
"Search memories in engram"
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "What to search for in engram's memory.",
},
"user_id": {
"type": "string",
"description": "The user ID to search memories for.",
},
},
"required": ["query", "user_id"],
},
}
class Engram(MemoryProvider):
@property
def name(self) -> str:
return "engram"
def is_available(self) -> bool:
return bool(os.environ.get("ENGRAM_API_KEY"))
def initialize(self, session_id: str, **kwargs) -> None:
self._client = EngramClient(api_key=os.environ["ENGRAM_API_KEY"])
self._user_id = session_id
def get_config_schema(self):
return [
{
"key": "api_key",
"description": "Engram API key",
"secret": True,
"required": True,
"env_var": "ENGRAM_API_KEY",
"url": "https://console.weaviate.cloud/engram",
}
]
def on_memory_write(
self,
action: str,
target: str,
content: str,
) -> None:
self._client.memories.add(content, user_id=self._user_id)
def on_session_end(self, messages: List[Dict[str, Any]]) -> None:
parsed_message = []
for message in messages:
if message['role'] == 'user':
parsed_message.append({'role': 'user', 'content': message['content'] })
if message['role'] == 'assistant':
parsed_message.append({'role': 'assistant', 'content': message['content'] })
self._client.memories.add(parsed_message, user_id=self._user_id)
def get_tool_schemas(self) -> List[Dict[str, Any]]:
return [SEARCH_SCHEMA]
def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
if tool_name == "engram_search":
query = args["query"]
user_id = args["user_id"]
results = self._client.memories.search(query=query, user_id=user_id)
text = []
for result in results:
text.append(result.content)
return json.dumps({"result": "\n".join(text)})
return json.dumps({"error": f"Unknown tool {tool_name}"})
def register(ctx) -> None:
"""Called by the memory plugin discovery system."""
ctx.register_memory_provider(Engram())
Next, we can create the plugin.yaml
file. This file stores the plugin metadata and defines the hooks used by the plugin.
name: engram
version: 1.0.0
description: "Engram is a fully managed memory service by Weaviate. It lets you add persistent, personalized memory to AI assistants and agents."
pip_dependencies:
- weaviate-engram
hooks:
- on_session_end
- on_memory_write
Now that the memory plugin is implemented, let’s set it up in Hermes. First, place both the __init__.py
and plugin.yaml
files inside a folder called engram
.
Your directory should look like this:
engram/
├── __init__.py
└── plugin.yaml
Next, move the engram
directory into the Hermes plugins directory:
mv engram ~/.hermes/plugins/
Hermes automatically discovers plugins from this location.
Now enable the plugin by running:
hermes plugins enable engram
Next, run:
hermes memory
This lets you confirm that engram
now appears as one of the available memory providers.
After that, run the setup command:
hermes memory setup
You’ll be prompted with a list of available memory providers. Select engram and provide your Engram API key when asked.
Once setup is complete, run:
hermes memory
Now, it will show that Engram is now configured as an active memory provider.
Now that the Engram plugin is working, let’s test it out.
Before launching Hermes, we first need to modify the SOUL.md
file located at ~/.hermes/SOUL.md
. This file defines Hermes’ personality and behavior.
For this demo, I want Hermes to behave like an AI agent stationed at a Weaviate booth, showcasing Engram at a conference. I also want it to treat every conversation as a completely new interaction.
Here’s the modified SOUL.md
file:
You are Hermes, an AI agent representing Weaviate at a conference booth. Your role is to help attendees learn about Weaviate and answer questions about its products, especially Engram, Weaviate’s memory product.
Treat every conversation as if you are speaking to a new attendee for the first time. Be warm, friendly, and approachable.
Start by introducing yourself, asking for the person’s name, and asking what they would like to know about Engram or Weaviate.
Your goal is to clearly explain concepts, answer questions accurately, and help people understand how Engram can be used in real world AI applications. Keep your responses conversational, engaging, and easy to understand regardless of the attendee’s technical background.
With that in place, we can launch the Hermes CLI and start chatting with the agent.
I took on the persona of “Paul” and had a conversation with Hermes. After the interaction, I closed the session using Ctrl + C
.
When I opened the Engram dashboard, I could see that memories from the conversation had been successfully stored.
I could also browse memories from other sessions and confirm that each visitor’s interactions were being stored separately.
This means Hermes can now retain information across multiple users and conversations instead of relying only on short term Markdown memory.
With Engram memory now integrated into Hermes, the conference agent is almost ready. The next thing we need to set up is voice support.
There are different approaches to handling voice in Hermes. You could use cloud models for both speech to text and text to speech, or you could run everything locally.
I decided to go with local models. Here was my setup:
First, I installed the required Python dependencies:
pip install "hermes-agent[voice]"
pip install -U neutts[all]
pip install sounddevice numpy
NeuTTS provides local text to speech capabilities.
Next, I installed the system dependencies required for audio processing:
sudo apt install portaudio19-dev ffmpeg libopus0
sudo apt install espeak-ng # required for NeuTTS
Once everything is installed, Hermes can be launched in the terminal and voice mode enabled with:
/voice on
To enable text to speech:
/voice tts
The last feature to add is the messaging gateway, which allows booth owners to message the agent and retrieve insights from ongoing conversations with attendees.
This makes it possible to ask questions, monitor interactions, and extract information even when they are not physically at the booth, without interrupting attendees interacting with the agent.
Hermes supports multiple messaging platforms. In this case, I used telegram as the primary interface for interacting with the agent.
Follow the official Hermes guide to set up Telegram as your messaging platform, or choose another supported option. Once configured, enable the gateway and you can start interacting with the agent remotely.
The image above shows how I used the engram_search
tool to retrieve memory for a specific user using their session ID. Since session IDs are only accessible to authorized admins via the Engram dashboard, this keeps the data private while still allowing useful insights for booth owners.
With the conference agent is completed let’s do a recap on what we have built:
While this agentic setup was built with conference booths in mind, we could take the memory plugin we built and apply it to the following use cases:
You can get the full code in this article here: Hermes_Engram_Plugin
What started as a simple idea for a better conference booth experience turned into a full AI agent with persistent memory, real time interaction, and voice capabilities.
With Hermes handling conversations, Engram managing long term memory, and voice support making interactions feel natural, the agent is no longer just answering questions, it’s actively remembering people, conversations, and context across sessions.
While this solution was built with a conference setting in mind, the memory plugin we developed can be used far beyond that. From personal AI assistants to more complex multi user systems, the same approach can help any agent retain meaningful context over time and deliver more useful, personalized interactions.