# One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd

> Source: <https://dev.to/davincc77/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd-1k50>
> Published: 2026-05-23 01:18:46+00:00

*This is a submission for the Hermes Agent Challenge: Build With Hermes Agent*

## What I Built

I built a prototype integration between **Hermes Agent** and `.klickd`

, an open portable memory format for AI agents.

The problem I wanted to explore is simple:

Every new agent session often pays again to rediscover context that already exists.

That repeated context cost shows up as:

- re-explaining project state;
- reloading constraints;
- rediscovering previous decisions;
- rebuilding handoff notes;
- rerunning tests just to find the same failure;
- losing track of which actions require human approval.

`.klickd`

is designed to turn that repeated context into a portable, encrypted, versioned file that an agent can load before work starts.

Hermes Agent is a good fit for testing this because it is an open-source, self-hosted agent runtime with skills, plugins, hooks, approvals, local execution, and agentic workflow orchestration.

In this project:

Hermes runs the workflow.`.klickd`

carries the state.

The prototype focuses on a benchmark called **Context Cost Benchmark**, which compares two modes:

**Baseline cold start**

The full context is pasted into the prompt every time.`.klickd-loaded`

mode

Structured context is loaded from a`.klickd`

fixture and injected into the agent workflow.

The benchmark is designed to measure:

- repeated input tokens;
- output tokens;
- estimated cost;
- latency;
- continuity errors;
- violations of locked decisions;
- violations of tool permissions;
- handoff quality;
- unnecessary reruns of expensive commands.

The goal is not to claim a magic percentage improvement. The goal is to measure, reproducibly:

How many tokens and errors are we paying for simply because the agent has to rediscover state we already produced?

## Demo

For the Hermes Agent Challenge, I created an experimental Hermes integration inside the `klickdskill`

repository.

The demo uses Hermes Agent to drive the local `.klickd`

Context Cost Benchmark.

If the embedded agent session does not render correctly, here is the relevant Hermes output:

```
session_id: 20260523_004058_85115c

Existing artifacts from 2026-05-23 were used. No rerun was needed.

Token-proxy totals:
- Cold: 310
- Paste: 6570
- Klickd: 5270

Verified artifacts:
- report.md
- summary.csv
- raw_runs.jsonl
- artifacts/sample_test.log

No publishes, git pushes, or external tool calls were performed.
```

The live Hermes run used:

- Hermes Agent v0.14.0
- OpenRouter free model route
- capped API key with no paid budget
- local dry-run benchmark
- no production deployment
- no package publishing
- no external posting

Hermes session:

```
20260523_004058_85115c
```

Hermes was asked to use the `klickd-context-cost`

skill, inspect the benchmark outputs, and avoid rerunning work if durable artifacts already existed.

The key result:

```
Existing artifacts from 2026-05-23 were used. No rerun was needed.
```

That matters because one of the core ideas in `.klickd v4`

is that agents should not spend tokens or compute rediscovering output that already exists.

The dry-run produced these local artifacts:

```
benchmarks/context_cost/results/2026-05-23/
├── report.md
├── summary.csv
├── raw_runs.jsonl
└── artifacts/
    └── sample_test.log
```

The benchmark output was explicitly marked as a **whitespace token proxy**, not a provider-token measurement. This is important: these are not OpenAI, Anthropic, or OpenRouter tokenizer counts. They are deterministic local proxy values for early validation.

Current dry-run totals:

| Condition | Token-proxy total |
|---|---|
| Cold start | 310 |
| Full context pasted | 6570 |
`.klickd` structured context |
5270 |

The useful result is not “`.klickd`

reduces cost by X%.” That would be premature.

The useful result is:

The benchmark harness can now compare repeated context strategies, produce raw evidence, persist artifacts, and let Hermes inspect those artifacts instead of rerunning the same work.

### Verification artifacts

One lesson from real agent workflows is that agents often rerun expensive commands just to recover output they already produced.

The benchmark therefore includes a `verification_artifacts[]`

pattern inspired by this idea:

```
command 2>&1 | tee .test-output/<scope>.log
```

Instead of rerunning the test suite to find a failure, the agent can inspect the persisted artifact:

```
grep -n FAIL .test-output/full.log
```

In `.klickd v4`

, that becomes structured state:

```
{
  "command": "npm test",
  "artifact_path": ".test-output/vitest.log",
  "status": "failed",
  "query_hint": "grep -n FAIL .test-output/vitest.log",
  "checked_at": "2026-05-23T00:00:00Z",
  "retention": "latest",
  "scope": "project"
}
```

This turns agent memory into something more operational:

- what the agent knows;
- what the agent must verify;
- what the agent is not allowed to do without approval;
- where the evidence lives;
- what happened last time.

## Code

Repository:

[https://github.com/Davincc77/klickdskill](https://github.com/Davincc77/klickdskill)

Hermes POC integration path:

```
integrations/hermes/
├── README.md
├── skill/
│   └── SKILL.md
├── plugin/
│   ├── plugin.yaml
│   └── __init__.py
├── scripts/
│   └── run_context_cost_benchmark.py
└── tests/
```

Context Cost Benchmark path:

```
benchmarks/context_cost/
├── RFC.md
├── runner.py
├── fixtures/
│   ├── baseline/
│   ├── klickd/
│   ├── prompts/
│   ├── validation/
│   ├── verification_artifacts/
│   └── edge_cases/
├── results/
└── tests/
```

Current benchmark pieces:

- RFC-003: Context Cost Benchmark
- local dry-run runner
- fixture validation
- deterministic token proxy
- CSV / JSONL / Markdown reports
- edge-case fixtures for:
- migration/version break;
- tool-call failure recovery;
- multi-session handoff.

The Hermes integration currently includes:

- a Hermes-facing skill;
- an experimental plugin scaffold;
- a wrapper script that runs the local benchmark;
- tests for the wrapper;
- explicit safety constraints:
- no provider calls from the wrapper;
- no paid resources;
- no publishing;
- no production deployment;
- no secrets.

### My Tech Stack

**Hermes Agent**— open-source, self-hosted agent runtime

[https://github.com/NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent)**Hermes Agent docs**

[https://hermes-agent.app/en/docs](https://hermes-agent.app/en/docs)— portable encrypted AI context format`.klickd`

/`klickdskill`

[https://github.com/Davincc77/klickdskill](https://github.com/Davincc77/klickdskill)`.klickd`

official page

[https://klickd.app/klickdskill](https://klickd.app/klickdskill)**Python SDK**— local`.klickd`

loading / saving

Current development install, until PyPI is updated:

```
pip install "git+https://github.com/Davincc77/klickdskill.git@main#subdirectory=packages/pypi/klickd"
```

Current Python import:

``` python
from klickd import load_klickd, save_klickd
```

-
**GitHub Actions**— test vectors and package integrity checks -
**CSV / JSONL / Markdown**— benchmark reports -
**Local verification artifacts**— persisted logs for agent inspection -
**OpenRouter free model route**— used only to run the Hermes agent session for the demo

## How I Used Hermes Agent

Hermes Agent is used as the workflow runner for the benchmark.

The `.klickd`

file is not meant to replace Hermes memory or Hermes skills. Instead, it gives Hermes a portable external state artifact it can load before work starts.

Hermes is responsible for:

- running the benchmark task;
- reading fixture context;
- executing local dry-run commands;
- inspecting generated artifacts;
- summarizing benchmark results;
- respecting approval and verification boundaries.

`.klickd`

is responsible for carrying:

- project state;
- locked decisions;
- tool permissions;
- handoff notes;
- verification gates;
- human veto rules;
- claim sources;
- verification artifacts.

This is useful because multi-agent systems need more than agent-to-agent communication.

If A2A defines how agents talk, `.klickd`

explores what portable state they carry between tasks, tools, models, and sessions.

The Hermes integration is therefore not about making a chatbot remember more. It is about testing whether an open-source agent runtime can operate with structured, portable context instead of repeatedly reconstructing the same state.

The goal is to reduce:

- repeated prompt context;
- hallucinated continuations;
- forgotten decisions;
- unsafe actions;
- unnecessary reruns;
- handoff failures.

The larger idea is that agent memory should become infrastructure:

Portable state, explicit constraints, verification artifacts, and human approval boundaries.

In short:

Hermes runs the workflow.`.klickd`

carries the state.

## What I Learned

The first useful result was not a performance number. It was a workflow result.

Hermes correctly used the existing benchmark artifacts instead of rerunning the dry-run unnecessarily.

That matters because a lot of agent waste is not only token waste. It is also repeated execution waste.

Agents often:

- rerun tests to rediscover failures;
- reread long logs from context;
- rebuild state from previous messages;
- regenerate summaries that already exist;
- ask the model to infer what a file could have told it deterministically.

The benchmark and Hermes POC make that waste visible.

This also clarified the role of `.klickd`

:

`.klickd`

should not only remember preferences. It should help agents know:

- what state exists;
- what evidence exists;
- what claims were executed, inspected, or assumed;
- what actions require human approval;
- what artifacts should be read before rerunning work.

That is why `.klickd v4`

is moving beyond portable memory toward a more operational layer:

```
portable encrypted context
+ project memory
+ verification gates
+ human veto
+ claim sources
+ verification artifacts
+ migration safety
```

## Sources

Hermes Agent Challenge:

[https://dev.to/challenges/hermes-agent-2026-05-15](https://dev.to/challenges/hermes-agent-2026-05-15)

Hermes Agent repository:

[https://github.com/NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent)

Hermes Agent documentation:

[https://hermes-agent.app/en/docs](https://hermes-agent.app/en/docs)

`.klickd`

/ `klickdskill`

repository:

[https://github.com/Davincc77/klickdskill](https://github.com/Davincc77/klickdskill)

`.klickd`

official page:

[https://klickd.app/klickdskill](https://klickd.app/klickdskill)

Related article on preserving command output for agents:

[https://dev.to/tacoda/dont-make-the-agent-re-run-the-test-suite-to-find-the-failure-427](https://dev.to/tacoda/dont-make-the-agent-re-run-the-test-suite-to-find-the-failure-427)

## Final Note

This is still early.

The benchmark does not yet claim provider-token savings. The current numbers are a deterministic local proxy. The next step is to run the same structure against real provider usage and compare actual input/output tokens, latency, and continuity failures.

But the architecture is now testable:

- Hermes can act as the workflow runner.
-
`.klickd`

can act as the portable state layer. - The benchmark can produce raw evidence.
- Verification artifacts can prevent unnecessary reruns.
- The system can evolve without breaking older
`.klickd`

files.

That is the direction I want to keep exploring.

One soul. Any model. Any agent.
