{"slug": "one-soul-any-model-portable-memory-for-open-source-agents-with-klickd", "title": "One Soul, Any Model: Portable Memory for Open-Source Agents with .klickd", "summary": "The article describes a prototype integration between Hermes Agent and .klickd, an open portable memory format for AI agents, designed to reduce repeated context costs by allowing agents to load structured, encrypted, versioned memory files instead of rediscovering existing state. A benchmark called the Context Cost Benchmark was created to compare cold-start prompting against .klickd-loaded sessions, measuring token usage and errors. The key result was that the agent reused existing artifacts from a previous session, demonstrating that agents can avoid spending tokens or compute on rediscovering output that already exists.", "body_md": "*This is a submission for the Hermes Agent Challenge: Build With Hermes Agent*\n\n## What I Built\n\nI built a prototype integration between **Hermes Agent** and `.klickd`\n\n, an open portable memory format for AI agents.\n\nThe problem I wanted to explore is simple:\n\nEvery new agent session often pays again to rediscover context that already exists.\n\nThat repeated context cost shows up as:\n\n- re-explaining project state;\n- reloading constraints;\n- rediscovering previous decisions;\n- rebuilding handoff notes;\n- rerunning tests just to find the same failure;\n- losing track of which actions require human approval.\n\n`.klickd`\n\nis designed to turn that repeated context into a portable, encrypted, versioned file that an agent can load before work starts.\n\nHermes Agent is a good fit for testing this because it is an open-source, self-hosted agent runtime with skills, plugins, hooks, approvals, local execution, and agentic workflow orchestration.\n\nIn this project:\n\nHermes runs the workflow.`.klickd`\n\ncarries the state.\n\nThe prototype focuses on a benchmark called **Context Cost Benchmark**, which compares two modes:\n\n**Baseline cold start**\n\nThe full context is pasted into the prompt every time.`.klickd-loaded`\n\nmode\n\nStructured context is loaded from a`.klickd`\n\nfixture and injected into the agent workflow.\n\nThe benchmark is designed to measure:\n\n- repeated input tokens;\n- output tokens;\n- estimated cost;\n- latency;\n- continuity errors;\n- violations of locked decisions;\n- violations of tool permissions;\n- handoff quality;\n- unnecessary reruns of expensive commands.\n\nThe goal is not to claim a magic percentage improvement. The goal is to measure, reproducibly:\n\nHow many tokens and errors are we paying for simply because the agent has to rediscover state we already produced?\n\n## Demo\n\nFor the Hermes Agent Challenge, I created an experimental Hermes integration inside the `klickdskill`\n\nrepository.\n\nThe demo uses Hermes Agent to drive the local `.klickd`\n\nContext Cost Benchmark.\n\nIf the embedded agent session does not render correctly, here is the relevant Hermes output:\n\n```\nsession_id: 20260523_004058_85115c\n\nExisting artifacts from 2026-05-23 were used. No rerun was needed.\n\nToken-proxy totals:\n- Cold: 310\n- Paste: 6570\n- Klickd: 5270\n\nVerified artifacts:\n- report.md\n- summary.csv\n- raw_runs.jsonl\n- artifacts/sample_test.log\n\nNo publishes, git pushes, or external tool calls were performed.\n```\n\nThe live Hermes run used:\n\n- Hermes Agent v0.14.0\n- OpenRouter free model route\n- capped API key with no paid budget\n- local dry-run benchmark\n- no production deployment\n- no package publishing\n- no external posting\n\nHermes session:\n\n```\n20260523_004058_85115c\n```\n\nHermes was asked to use the `klickd-context-cost`\n\nskill, inspect the benchmark outputs, and avoid rerunning work if durable artifacts already existed.\n\nThe key result:\n\n```\nExisting artifacts from 2026-05-23 were used. No rerun was needed.\n```\n\nThat matters because one of the core ideas in `.klickd v4`\n\nis that agents should not spend tokens or compute rediscovering output that already exists.\n\nThe dry-run produced these local artifacts:\n\n```\nbenchmarks/context_cost/results/2026-05-23/\n├── report.md\n├── summary.csv\n├── raw_runs.jsonl\n└── artifacts/\n    └── sample_test.log\n```\n\nThe benchmark output was explicitly marked as a **whitespace token proxy**, not a provider-token measurement. This is important: these are not OpenAI, Anthropic, or OpenRouter tokenizer counts. They are deterministic local proxy values for early validation.\n\nCurrent dry-run totals:\n\n| Condition | Token-proxy total |\n|---|---|\n| Cold start | 310 |\n| Full context pasted | 6570 |\n`.klickd` structured context |\n5270 |\n\nThe useful result is not “`.klickd`\n\nreduces cost by X%.” That would be premature.\n\nThe useful result is:\n\nThe benchmark harness can now compare repeated context strategies, produce raw evidence, persist artifacts, and let Hermes inspect those artifacts instead of rerunning the same work.\n\n### Verification artifacts\n\nOne lesson from real agent workflows is that agents often rerun expensive commands just to recover output they already produced.\n\nThe benchmark therefore includes a `verification_artifacts[]`\n\npattern inspired by this idea:\n\n```\ncommand 2>&1 | tee .test-output/<scope>.log\n```\n\nInstead of rerunning the test suite to find a failure, the agent can inspect the persisted artifact:\n\n```\ngrep -n FAIL .test-output/full.log\n```\n\nIn `.klickd v4`\n\n, that becomes structured state:\n\n```\n{\n  \"command\": \"npm test\",\n  \"artifact_path\": \".test-output/vitest.log\",\n  \"status\": \"failed\",\n  \"query_hint\": \"grep -n FAIL .test-output/vitest.log\",\n  \"checked_at\": \"2026-05-23T00:00:00Z\",\n  \"retention\": \"latest\",\n  \"scope\": \"project\"\n}\n```\n\nThis turns agent memory into something more operational:\n\n- what the agent knows;\n- what the agent must verify;\n- what the agent is not allowed to do without approval;\n- where the evidence lives;\n- what happened last time.\n\n## Code\n\nRepository:\n\n[https://github.com/Davincc77/klickdskill](https://github.com/Davincc77/klickdskill)\n\nHermes POC integration path:\n\n```\nintegrations/hermes/\n├── README.md\n├── skill/\n│   └── SKILL.md\n├── plugin/\n│   ├── plugin.yaml\n│   └── __init__.py\n├── scripts/\n│   └── run_context_cost_benchmark.py\n└── tests/\n```\n\nContext Cost Benchmark path:\n\n```\nbenchmarks/context_cost/\n├── RFC.md\n├── runner.py\n├── fixtures/\n│   ├── baseline/\n│   ├── klickd/\n│   ├── prompts/\n│   ├── validation/\n│   ├── verification_artifacts/\n│   └── edge_cases/\n├── results/\n└── tests/\n```\n\nCurrent benchmark pieces:\n\n- RFC-003: Context Cost Benchmark\n- local dry-run runner\n- fixture validation\n- deterministic token proxy\n- CSV / JSONL / Markdown reports\n- edge-case fixtures for:\n- migration/version break;\n- tool-call failure recovery;\n- multi-session handoff.\n\nThe Hermes integration currently includes:\n\n- a Hermes-facing skill;\n- an experimental plugin scaffold;\n- a wrapper script that runs the local benchmark;\n- tests for the wrapper;\n- explicit safety constraints:\n- no provider calls from the wrapper;\n- no paid resources;\n- no publishing;\n- no production deployment;\n- no secrets.\n\n### My Tech Stack\n\n**Hermes Agent**— open-source, self-hosted agent runtime\n\n[https://github.com/NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent)**Hermes Agent docs**\n\n[https://hermes-agent.app/en/docs](https://hermes-agent.app/en/docs)— portable encrypted AI context format`.klickd`\n\n/`klickdskill`\n\n[https://github.com/Davincc77/klickdskill](https://github.com/Davincc77/klickdskill)`.klickd`\n\nofficial page\n\n[https://klickd.app/klickdskill](https://klickd.app/klickdskill)**Python SDK**— local`.klickd`\n\nloading / saving\n\nCurrent development install, until PyPI is updated:\n\n```\npip install \"git+https://github.com/Davincc77/klickdskill.git@main#subdirectory=packages/pypi/klickd\"\n```\n\nCurrent Python import:\n\n``` python\nfrom klickd import load_klickd, save_klickd\n```\n\n-\n**GitHub Actions**— test vectors and package integrity checks -\n**CSV / JSONL / Markdown**— benchmark reports -\n**Local verification artifacts**— persisted logs for agent inspection -\n**OpenRouter free model route**— used only to run the Hermes agent session for the demo\n\n## How I Used Hermes Agent\n\nHermes Agent is used as the workflow runner for the benchmark.\n\nThe `.klickd`\n\nfile is not meant to replace Hermes memory or Hermes skills. Instead, it gives Hermes a portable external state artifact it can load before work starts.\n\nHermes is responsible for:\n\n- running the benchmark task;\n- reading fixture context;\n- executing local dry-run commands;\n- inspecting generated artifacts;\n- summarizing benchmark results;\n- respecting approval and verification boundaries.\n\n`.klickd`\n\nis responsible for carrying:\n\n- project state;\n- locked decisions;\n- tool permissions;\n- handoff notes;\n- verification gates;\n- human veto rules;\n- claim sources;\n- verification artifacts.\n\nThis is useful because multi-agent systems need more than agent-to-agent communication.\n\nIf A2A defines how agents talk, `.klickd`\n\nexplores what portable state they carry between tasks, tools, models, and sessions.\n\nThe Hermes integration is therefore not about making a chatbot remember more. It is about testing whether an open-source agent runtime can operate with structured, portable context instead of repeatedly reconstructing the same state.\n\nThe goal is to reduce:\n\n- repeated prompt context;\n- hallucinated continuations;\n- forgotten decisions;\n- unsafe actions;\n- unnecessary reruns;\n- handoff failures.\n\nThe larger idea is that agent memory should become infrastructure:\n\nPortable state, explicit constraints, verification artifacts, and human approval boundaries.\n\nIn short:\n\nHermes runs the workflow.`.klickd`\n\ncarries the state.\n\n## What I Learned\n\nThe first useful result was not a performance number. It was a workflow result.\n\nHermes correctly used the existing benchmark artifacts instead of rerunning the dry-run unnecessarily.\n\nThat matters because a lot of agent waste is not only token waste. It is also repeated execution waste.\n\nAgents often:\n\n- rerun tests to rediscover failures;\n- reread long logs from context;\n- rebuild state from previous messages;\n- regenerate summaries that already exist;\n- ask the model to infer what a file could have told it deterministically.\n\nThe benchmark and Hermes POC make that waste visible.\n\nThis also clarified the role of `.klickd`\n\n:\n\n`.klickd`\n\nshould not only remember preferences. It should help agents know:\n\n- what state exists;\n- what evidence exists;\n- what claims were executed, inspected, or assumed;\n- what actions require human approval;\n- what artifacts should be read before rerunning work.\n\nThat is why `.klickd v4`\n\nis moving beyond portable memory toward a more operational layer:\n\n```\nportable encrypted context\n+ project memory\n+ verification gates\n+ human veto\n+ claim sources\n+ verification artifacts\n+ migration safety\n```\n\n## Sources\n\nHermes Agent Challenge:\n\n[https://dev.to/challenges/hermes-agent-2026-05-15](https://dev.to/challenges/hermes-agent-2026-05-15)\n\nHermes Agent repository:\n\n[https://github.com/NousResearch/hermes-agent](https://github.com/NousResearch/hermes-agent)\n\nHermes Agent documentation:\n\n[https://hermes-agent.app/en/docs](https://hermes-agent.app/en/docs)\n\n`.klickd`\n\n/ `klickdskill`\n\nrepository:\n\n[https://github.com/Davincc77/klickdskill](https://github.com/Davincc77/klickdskill)\n\n`.klickd`\n\nofficial page:\n\n[https://klickd.app/klickdskill](https://klickd.app/klickdskill)\n\nRelated article on preserving command output for agents:\n\n[https://dev.to/tacoda/dont-make-the-agent-re-run-the-test-suite-to-find-the-failure-427](https://dev.to/tacoda/dont-make-the-agent-re-run-the-test-suite-to-find-the-failure-427)\n\n## Final Note\n\nThis is still early.\n\nThe benchmark does not yet claim provider-token savings. The current numbers are a deterministic local proxy. The next step is to run the same structure against real provider usage and compare actual input/output tokens, latency, and continuity failures.\n\nBut the architecture is now testable:\n\n- Hermes can act as the workflow runner.\n-\n`.klickd`\n\ncan act as the portable state layer. - The benchmark can produce raw evidence.\n- Verification artifacts can prevent unnecessary reruns.\n- The system can evolve without breaking older\n`.klickd`\n\nfiles.\n\nThat is the direction I want to keep exploring.\n\nOne soul. Any model. Any agent.", "url": "https://wpnews.pro/news/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd", "canonical_source": "https://dev.to/davincc77/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd-1k50", "published_at": "2026-05-23 01:18:46+00:00", "updated_at": "2026-05-23 01:31:27.745773+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "open-source", "developer-tools"], "entities": ["Hermes Agent", ".klickd", "Context Cost Benchmark", "klickdskill"], "alternates": {"html": "https://wpnews.pro/news/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd", "markdown": "https://wpnews.pro/news/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd.md", "text": "https://wpnews.pro/news/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd.txt", "jsonld": "https://wpnews.pro/news/one-soul-any-model-portable-memory-for-open-source-agents-with-klickd.jsonld"}}