{"slug": "why-i-m-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that", "title": "Why I'm building Hyphae: provenance over prediction (and the 3-line baseline that tied it)", "summary": "A developer built Hyphae, a Rust-based cognitive substrate that answers queries by emitting byte-identical quotations of stored memory fragments over a SHA-256 hash-chained journal, with no large language model in the answering path. The project's key insight emerged when a three-line baseline—simply printing retrieved fragments verbatim—tied Hyphae on every correctness and grounding metric, revealing that verifiable provenance is a property of verbatim quotation itself, not of the system's complexity. The developer now focuses on adding tamper-evident provenance to any extractive retrieval system, using an Ed25519 signature anchored outside the store to prevent chain-aware attackers from rewriting the hash chain.", "body_md": "A few months ago I set out to build a cognitive substrate without a large language model in the answering path. I had a thesis I liked, a Rust workspace, and a lot of conviction.\n\nThen I wrote a three-line baseline that tied it on every metric I cared about.\n\nThis is the story of why that was the best thing that happened to the project — and why I'm still building it, just pointed at a sharper target.\n\nWhen a language model answers a grounded question, it *paraphrases* its sources. That paraphrase is fluent, often correct, and — this is the part that bothers me — **impossible to bind back to its source byte-for-byte**. You can cite a document. You cannot prove, after the fact, that the words in the answer are the words that were stored, unaltered, at a known position.\n\nFor a chatbot that doesn't matter. For anything that has to be *audited* — a compliance trail, a medical or legal memory, an agent acting on your behalf over months — it matters a lot. \"Trust me, I read the docs\" is not a property you can verify.\n\nSo I started building **Hyphae**: a substrate that answers by emitting **byte-identical quotations** of stored memory fragments, over a SHA-256 hash-chained journal, with no LLM in the cognition path. Rust, CPU-only, a single binary.\n\nThe shape of it is simple:\n\n```\n// Every stored fragment is appended verbatim to a hash chain.\nlet (seq, head) = journal.append(\"memory_op\", fragment.bytes())?;\n\n// An answer span is a byte-identical quotation of a stored fragment.\n// Tamper with any historical entry and the recomputed chain breaks\n// at the next link — verify() localises exactly where.\njournal.verify()?;\n```\n\nNothing here is cryptographically novel. Hash-chained logs are old and well understood — Haber & Stornetta in 1991, Merkle before that, Certificate Transparency, git. I want to be honest about that up front, because the interesting part isn't the chain.\n\nI wanted to show Hyphae was *better* than an LLM+RAG pipeline at grounded answering. So I built the comparison properly: a real retriever, reranking, six models across three retrieval modes, two corpora, twelve metrics.\n\nThen a reviewer asked the obvious question: *what does a trivial baseline score?*\n\nSo I wrote `echo`\n\n— a few lines that just print the retrieved fragment back. It tied Hyphae on every correctness and grounding metric. So did `echo + journal`\n\n.\n\nThat stings for about a day. Then it becomes the whole point.\n\nThe measured correctness and grounding were never properties of *my system*. They are properties of **verbatim quotation** itself. If you emit a stored span unchanged, of course it's \"grounded\" — it *is* the source. Hyphae's seventeen subsystems weren't what made the answers auditable. The verbatim-emission-over-a-journal layer was. And that layer is **addable to any extractive retrieval system** — it isn't Hyphae-specific at all.\n\nSo I stopped claiming Hyphae was a better brain and started claiming something narrower and, I think, truer:\n\nVerifiable provenance is a property you can add to grounded retrieval. A paraphrase destroys byte-level bindability to its source; a verbatim quotation preserves it, and a hash chain makes that binding independently auditable.\n\nThe contribution isn't the hash chain. It's the *observation*, and the *measurement* of it against eighteen LLM configurations and a tamper-detection benchmark.\n\nOnce you claim \"tamper-evident,\" people who know what they're doing immediately ask where it breaks. Good. The threat model is the product.\n\nA bare hash chain catches a store-only attacker who edits a record in place. It does **not** catch a *chain-aware* attacker who recomputes every hash forward and rewrites the head — because the head lives in the same store. So I anchor the head with an Ed25519 signature held outside the store (the attacker can't re-sign). That closes it.\n\nBut a single signature pins *a* valid head, not *the latest* one. Every head the journal ever had was, at its time, legitimately signed. An attacker can roll back to an earlier state and replay its genuine-but-stale anchor — and a lone signature check accepts it. So the heads get published to an **append-only, hash-chained ledger**, and an auditor checks the current head against the ledger's *tail*:\n\n```\n// A single signature pins *a* valid head.\n// An append-only ledger pins *the latest* one.\nverify_fresh_head(&current_head, &ledger, &verifying_key); // rollback rejected\n```\n\nThat's the pattern from Certificate Transparency and git, applied to memory provenance: the value isn't the chain, it's publishing the head to a monotonic log that third parties can compare.\n\nAnd I keep a column for what I *haven't* closed: a store that withholds later ledger entries is only caught once an auditor gets the true tail from an external witness (a timestamp authority, a gossiped tree head). That's deployment work, and I'd rather write it down than pretend it's solved.\n\nThe direction got clearer the moment the echo baseline humbled me. I'm not building a better answer engine. I'm building **provenance as a first-class, measurable property of grounded AI**, in the open. Concretely:\n\nThe substrate, the LLM+RAG comparator, every result envelope, the tamper-detection experiment, the provenance benchmark, and the full preprint are public. Code is Apache-2.0; the docs, corpora, and preprint are CC-BY-4.0.\n\nI'm a solo, self-taught founder building this in public, which means the dead ends are public too — the echo baseline being the best example. If you work on retrieval, tamper-evident logs, or grounded generation, I'd genuinely like to hear where you think this breaks. The threat model only gets better when someone smarter than me attacks it.", "url": "https://wpnews.pro/news/why-i-m-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that", "canonical_source": "https://dev.to/terrizoaguimor/why-im-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that-tied-it-2e32", "published_at": "2026-05-29 14:55:20+00:00", "updated_at": "2026-05-29 15:13:34.174490+00:00", "lang": "en", "topics": ["large-language-models", "ai-safety", "ai-research", "ai-infrastructure", "ai-agents"], "entities": ["Hyphae", "Rust"], "alternates": {"html": "https://wpnews.pro/news/why-i-m-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that", "markdown": "https://wpnews.pro/news/why-i-m-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that.md", "text": "https://wpnews.pro/news/why-i-m-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that.txt", "jsonld": "https://wpnews.pro/news/why-i-m-building-hyphae-provenance-over-prediction-and-the-3-line-baseline-that.jsonld"}}