cd /news/large-language-models/why-i-m-building-hyphae-provenance-o… · home topics large-language-models article
[ARTICLE · art-17738] src=dev.to pub= topic=large-language-models verified=true sentiment=· neutral

Why I'm building Hyphae: provenance over prediction (and the 3-line baseline that tied it)

A developer built Hyphae, a Rust-based cognitive substrate that answers queries by emitting byte-identical quotations of stored memory fragments over a SHA-256 hash-chained journal, with no large language model in the answering path. The project's key insight emerged when a three-line baseline—simply printing retrieved fragments verbatim—tied Hyphae on every correctness and grounding metric, revealing that verifiable provenance is a property of verbatim quotation itself, not of the system's complexity. The developer now focuses on adding tamper-evident provenance to any extractive retrieval system, using an Ed25519 signature anchored outside the store to prevent chain-aware attackers from rewriting the hash chain.

read5 min publishedMay 29, 2026

A few months ago I set out to build a cognitive substrate without a large language model in the answering path. I had a thesis I liked, a Rust workspace, and a lot of conviction.

Then I wrote a three-line baseline that tied it on every metric I cared about.

This is the story of why that was the best thing that happened to the project — and why I'm still building it, just pointed at a sharper target.

When a language model answers a grounded question, it paraphrases its sources. That paraphrase is fluent, often correct, and — this is the part that bothers me — impossible to bind back to its source byte-for-byte. You can cite a document. You cannot prove, after the fact, that the words in the answer are the words that were stored, unaltered, at a known position.

For a chatbot that doesn't matter. For anything that has to be audited — a compliance trail, a medical or legal memory, an agent acting on your behalf over months — it matters a lot. "Trust me, I read the docs" is not a property you can verify.

So I started building Hyphae: a substrate that answers by emitting byte-identical quotations of stored memory fragments, over a SHA-256 hash-chained journal, with no LLM in the cognition path. Rust, CPU-only, a single binary.

The shape of it is simple:

// Every stored fragment is appended verbatim to a hash chain.
let (seq, head) = journal.append("memory_op", fragment.bytes())?;

// An answer span is a byte-identical quotation of a stored fragment.
// Tamper with any historical entry and the recomputed chain breaks
// at the next link — verify() localises exactly where.
journal.verify()?;

Nothing here is cryptographically novel. Hash-chained logs are old and well understood — Haber & Stornetta in 1991, Merkle before that, Certificate Transparency, git. I want to be honest about that up front, because the interesting part isn't the chain.

I wanted to show Hyphae was better than an LLM+RAG pipeline at grounded answering. So I built the comparison properly: a real retriever, reranking, six models across three retrieval modes, two corpora, twelve metrics.

Then a reviewer asked the obvious question: what does a trivial baseline score?

So I wrote echo

— a few lines that just print the retrieved fragment back. It tied Hyphae on every correctness and grounding metric. So did echo + journal

.

That stings for about a day. Then it becomes the whole point.

The measured correctness and grounding were never properties of my system. They are properties of verbatim quotation itself. If you emit a stored span unchanged, of course it's "grounded" — it is the source. Hyphae's seventeen subsystems weren't what made the answers auditable. The verbatim-emission-over-a-journal layer was. And that layer is addable to any extractive retrieval system — it isn't Hyphae-specific at all.

So I stopped claiming Hyphae was a better brain and started claiming something narrower and, I think, truer:

Verifiable provenance is a property you can add to grounded retrieval. A paraphrase destroys byte-level bindability to its source; a verbatim quotation preserves it, and a hash chain makes that binding independently auditable.

The contribution isn't the hash chain. It's the observation, and the measurement of it against eighteen LLM configurations and a tamper-detection benchmark.

Once you claim "tamper-evident," people who know what they're doing immediately ask where it breaks. Good. The threat model is the product.

A bare hash chain catches a store-only attacker who edits a record in place. It does not catch a chain-aware attacker who recomputes every hash forward and rewrites the head — because the head lives in the same store. So I anchor the head with an Ed25519 signature held outside the store (the attacker can't re-sign). That closes it.

But a single signature pins a valid head, not the latest one. Every head the journal ever had was, at its time, legitimately signed. An attacker can roll back to an earlier state and replay its genuine-but-stale anchor — and a lone signature check accepts it. So the heads get published to an append-only, hash-chained ledger, and an auditor checks the current head against the ledger's tail:

// A single signature pins *a* valid head.
// An append-only ledger pins *the latest* one.
verify_fresh_head(&current_head, &ledger, &verifying_key); // rollback rejected

That's the pattern from Certificate Transparency and git, applied to memory provenance: the value isn't the chain, it's publishing the head to a monotonic log that third parties can compare.

And I keep a column for what I haven't closed: a store that withholds later ledger entries is only caught once an auditor gets the true tail from an external witness (a timestamp authority, a gossiped tree head). That's deployment work, and I'd rather write it down than pretend it's solved.

The direction got clearer the moment the echo baseline humbled me. I'm not building a better answer engine. I'm building provenance as a first-class, measurable property of grounded AI, in the open. Concretely:

The substrate, the LLM+RAG comparator, every result envelope, the tamper-detection experiment, the provenance benchmark, and the full preprint are public. Code is Apache-2.0; the docs, corpora, and preprint are CC-BY-4.0.

I'm a solo, self-taught founder building this in public, which means the dead ends are public too — the echo baseline being the best example. If you work on retrieval, tamper-evident logs, or grounded generation, I'd genuinely like to hear where you think this breaks. The threat model only gets better when someone smarter than me attacks it.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-i-m-building-hyp…] indexed:0 read:5min 2026-05-29 ·