One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd The article describes a prototype integration between Hermes Agent and .klickd, an open portable memory format for AI agents, designed to reduce repeated context costs by allowing agents to load structured, encrypted, versioned memory files instead of rediscovering existing state. A benchmark called the Context Cost Benchmark was created to compare cold-start prompting against .klickd-loaded sessions, measuring token usage and errors. The key result was that the agent reused existing artifacts from a previous session, demonstrating that agents can avoid spending tokens or compute on rediscovering output that already exists. This is a submission for the Hermes Agent Challenge: Build With Hermes Agent What I Built I built a prototype integration between Hermes Agent and .klickd , an open portable memory format for AI agents. The problem I wanted to explore is simple: Every new agent session often pays again to rediscover context that already exists. That repeated context cost shows up as: - re-explaining project state; - reloading constraints; - rediscovering previous decisions; - rebuilding handoff notes; - rerunning tests just to find the same failure; - losing track of which actions require human approval. .klickd is designed to turn that repeated context into a portable, encrypted, versioned file that an agent can load before work starts. Hermes Agent is a good fit for testing this because it is an open-source, self-hosted agent runtime with skills, plugins, hooks, approvals, local execution, and agentic workflow orchestration. In this project: Hermes runs the workflow. .klickd carries the state. The prototype focuses on a benchmark called Context Cost Benchmark , which compares two modes: Baseline cold start The full context is pasted into the prompt every time. .klickd-loaded mode Structured context is loaded from a .klickd fixture and injected into the agent workflow. The benchmark is designed to measure: - repeated input tokens; - output tokens; - estimated cost; - latency; - continuity errors; - violations of locked decisions; - violations of tool permissions; - handoff quality; - unnecessary reruns of expensive commands. The goal is not to claim a magic percentage improvement. The goal is to measure, reproducibly: How many tokens and errors are we paying for simply because the agent has to rediscover state we already produced? Demo For the Hermes Agent Challenge, I created an experimental Hermes integration inside the klickdskill repository. The demo uses Hermes Agent to drive the local .klickd Context Cost Benchmark. If the embedded agent session does not render correctly, here is the relevant Hermes output: session id: 20260523 004058 85115c Existing artifacts from 2026-05-23 were used. No rerun was needed. Token-proxy totals: - Cold: 310 - Paste: 6570 - Klickd: 5270 Verified artifacts: - report.md - summary.csv - raw runs.jsonl - artifacts/sample test.log No publishes, git pushes, or external tool calls were performed. The live Hermes run used: - Hermes Agent v0.14.0 - OpenRouter free model route - capped API key with no paid budget - local dry-run benchmark - no production deployment - no package publishing - no external posting Hermes session: 20260523 004058 85115c Hermes was asked to use the klickd-context-cost skill, inspect the benchmark outputs, and avoid rerunning work if durable artifacts already existed. The key result: Existing artifacts from 2026-05-23 were used. No rerun was needed. That matters because one of the core ideas in .klickd v4 is that agents should not spend tokens or compute rediscovering output that already exists. The dry-run produced these local artifacts: benchmarks/context cost/results/2026-05-23/ ├── report.md ├── summary.csv ├── raw runs.jsonl └── artifacts/ └── sample test.log The benchmark output was explicitly marked as a whitespace token proxy , not a provider-token measurement. This is important: these are not OpenAI, Anthropic, or OpenRouter tokenizer counts. They are deterministic local proxy values for early validation. Current dry-run totals: | Condition | Token-proxy total | |---|---| | Cold start | 310 | | Full context pasted | 6570 | .klickd structured context | 5270 | The useful result is not “ .klickd reduces cost by X%.” That would be premature. The useful result is: The benchmark harness can now compare repeated context strategies, produce raw evidence, persist artifacts, and let Hermes inspect those artifacts instead of rerunning the same work. Verification artifacts One lesson from real agent workflows is that agents often rerun expensive commands just to recover output they already produced. The benchmark therefore includes a verification artifacts pattern inspired by this idea: command 2 &1 | tee .test-output/