Building a Replayable Decision Kernel in Rust

A developer built Calybris Core, a Rust crate that provides a deterministic, replayable decision kernel for systems that need to prove decisions after the fact. The crate evaluates candidates against policy snapshots to produce decisions with digests and optional write-ahead logs, enabling independent verification that the same input and policy yield the same result. It avoids floating-point arithmetic in favor of integer amounts and basis points to ensure repeatable behavior.

I built Calybris Core https://github.com/emirhuseynrmx/calybris-core because I kept running into the same uncomfortable question in decision-heavy systems: After the system says "yes", "no", or "use this instead", what exactly can we prove later? Not prove in the formal-methods sense. I mean the practical engineering version: Calybris Core is my attempt to make that boundary small, deterministic, and boring. It is not an LLM framework. It is not an exchange. It is not a strategy engine. It is not a web service. It is a Rust core primitive: php candidate + policy constraints - decision + digests + optional WAL + budget proof The first reference examples are LLM routing and pre-trade admission guards, but the crate itself is domain-neutral. Repo: github.com/emirhuseynrmx/calybris-core https://github.com/emirhuseynrmx/calybris-core Crate: crates.io/crates/calybris-core https://crates.io/crates/calybris-core Docs: docs.rs/calybris-core https://docs.rs/calybris-core A lot of systems have a hidden decision point that looks simple from the outside: request comes in system checks constraints system returns allow / substitute / reject But when something goes wrong, that simple decision becomes hard to reconstruct. Maybe the model was changed. Maybe a budget was exceeded. Maybe a cheaper fallback was selected. Maybe an operator needs to explain why an action was rejected. Maybe an audit log was modified after the fact. The typical response is to add more logs. That helps, but logs alone are not the same as replayable decisions. I wanted the core decision result to carry enough structure that an independent verifier can ask: If I replay the same input against the same policy snapshot, do I get the same decision? That became the central design constraint. The kernel module evaluates a KernelInput against a validated PolicySnapshot . The result is a KernelDecision : ExecuteRequested Substitute Reject The decision contains the selected candidate, reason, estimated cost, utility, counterfactual fields, evaluated/eligible counts, and policy/catalog epochs. The important part is not the specific domain. The important part is that the decision is deterministic and replayable. In code, the shape is intentionally direct: use calybris core::kernel:: ; use calybris core::verify::{verify decision, VerifyResult}; let decision = snapshot.prescribe input ; assert eq verify decision &snapshot, input, &decision , VerifyResult::Valid ; The hot path deliberately avoids: The crate root uses: forbid unsafe code That is not magic, but it is a useful line in the sand. The reference use cases both involve costs, budgets, confidence, risk, and utility. It would be easy to reach for f64 . I avoided it. Calybris uses integer amounts and basis points. Financial amounts are fixed-point microcents. Quality, risk, confidence, and policy thresholds are represented as integer basis points. That keeps replay behavior less surprising. For audit-oriented code, "close enough" is a dangerous phrase. If a decision depends on a threshold, I want the arithmetic to be explicit and repeatable. Replay alone is not enough. You also need stable fingerprints. Calybris computes canonical SHA-256 digests for: The digest layouts are version-tagged byte layouts, not hashes of arbitrary JSON. That distinction matters. JSON is great for transport and inspection, but field order and serialization choices are not a good audit boundary. The digest tags are explicit: calypol1 calyinp1 calydcn1 calyldg1 Policy models are sorted before hashing. Ledger tenants are sorted before hashing. A logically equivalent snapshot should not get a different fingerprint because a map happened to iterate differently. A decision can be wrapped in an audit bundle: policy digest input digest decision digest replay valid The verifier checks the structural decision, not just a string. If you change the input, replay fails. If you change the decision, replay fails. If you use the wrong policy, replay fails. If the digest fields do not match canonical recomputation, replay fails. That is the reason I have been using the phrase "proof-carrying decision core", although I am still looking for feedback on whether that wording is too strong. To be clear: this is not a formal proof system. It is a replayable evidence bundle. The crate also includes an optional write-ahead log. Each WAL entry contains: The unkeyed mode is useful for corruption detection and basic tamper evidence. The keyed mode uses HMAC-SHA256, which is the mode you would use if an attacker might rewrite entries and recompute hashes. The audited WAL path looks like this: php prescribe - audit bundle - append audited - replay audited wal Replay fails closed if the chain is broken or if any policy/input/decision digest does not match. I intentionally did not put secret storage, key rotation, file locking, or multi-process coordination inside this crate. Those are deployment concerns and should be owned by the embedding system. The budget engine is another small core primitive. The invariant is: remaining + reserved + committed lifetime == initial A reservation removes spendable balance. A commit turns a reservation into lifetime committed spend. A release returns the hold. A top-up extends initial and remaining budget. The budget engine uses CAS for the hot balance updates and mutex-protected metadata maps for the surrounding state. The invariant is checked on frozen snapshots. Multi-step operations may have transient internal states, so the docs are careful not to claim every mid-operation snapshot is linearizable. That distinction matters. Audit docs should say what is guaranteed, not what sounds good. Calybris is narrower than a rules engine. It does not try to provide a policy language. It does not parse arbitrary user rules. It does not evaluate scripts. The current kernel is closer to: rank candidates under hard constraints return the best positive-utility candidate otherwise reject That narrowness is intentional. I wanted the core to be small enough to reason about, test, replay, and document. A larger product can put a policy language above this layer. Calybris is the deterministic bottom layer. The project has tests for the parts I would worry about first: The CI runs MSRV and stable jobs, clippy with warnings denied, docs, examples, proptest-heavy jobs, Loom, Miri, cargo-audit, and cargo-deny. That does not make it "audited". It does make it less hand-wavy. git clone https://github.com/emirhuseynrmx/calybris-core cd calybris-core cargo run --example quickstart cargo run --example llm routing cargo run --example replay audit Use it as a dependency: cargo add calybris-core Kernel-only, without WAL: cargo add calybris-core --no-default-features The current release is v0.3.10 . Release notes: github.com/emirhuseynrmx/calybris-core/releases/tag/v0.3.10 https://github.com/emirhuseynrmx/calybris-core/releases/tag/v0.3.10 The crate is Apache-2.0 and usable, but I would not describe it as a complete production platform. It is a core primitive. If you embed it in a production system, you still own: I would especially like feedback from Rust, security, infra, and systems people on: The repo is here: