Enki is a memory engine for LLM agents. This repository publishes evaluation results only — the engine is closed-source. No configuration, internals, or methodology beyond what is described below is included here.
Both systems ingest identical conversation histories from LongMemEval-S. Each system's retrieved memories are answered by the same model (Claude Haiku) and graded by the
**same** LLM-as-judge, at equal retrieval depth (K=10). The only variable is the memory layer.
**Validated slice: 25 instances** (full-benchmark run in progress).
| Question type | Enki | mem0 |
|---|---|---|
| Multi-session reasoning | 4 / 5 | |
| 2 / 5 | ||
| Knowledge update | 3 / 5 | 3 / 5 |
| Single-session (user) | 3 / 5 | 3 / 5 |
| Single-session (assistant) | 2 / 5 | 2 / 5 |
| Single-session (preference) | 2 / 5 | 2 / 5 |
Total | 14 / 25 | 12 / 25 |
Storage: Enki answers from0.49× the stored facts mem0 keeps on the same conversations (mean 138 vs 283).Standout: multi-session reasoning (4/5 vs 2/5).
Honest framing.This is a small, hand-validated slice; the overall margin (14 vs 12) is modest and within what a 25-item sample can show. The robust, repeatable result iscomparable answer accuracy at roughly half the memory footprint, with a clear multi-session advantage. Further evaluation is ongoing.
Measured on a ~139-fact store, CPU-only (no GPU), 240 samples:
| Percentile | Latency (ms) |
|---|---|
| mean | 7.6 | | p50 | 6.1 | | p95 | 11.9 | | p99 | 13.0 |
Full methodology and per-question results are available on request.
Enki Labs (UK) · 2026