# ON1 (G116 V8): 38μs Black-Box AI Memory Retrieval on Virtual Chip ISA

> Source: <https://github.com/ON1-Hao/ON1>
> Published: 2026-05-29 15:00:09+00:00

G116 v8: 38μs Black-box AI Memory Retrieval on Virtual Chip ISA (Latency-Separated Fetch/Compute/ANN) — Live Tunnel Inside

Unlike any conventional chip.G116 v8 introduces aquantum-inspired virtual ISAthat makes memory, compute, and ANN search latency observable – not just a single opaque query time.

Built for the next generation of LLMs (llama.cpp, real‑time RAG, natural language grounding).

G116 v8 decomposes vector retrieval into three hardware‑visible stages, just like a quantum memory fabric:

**Fetch Layer**– mmap‑based dataset mapping (zero‑copy, ~0.1–0.5 μs/op)** Compute Layer**– vector transformations (NumPy / BLAS, ~0.4–2 μs/op)** Search Layer**– ANN similarity (currently brute‑force, ~3–10 ms/op; FAISS/HNSW coming)

This is **not** another black‑box vector DB. It’s a *virtual chip ISA* that makes RAG bottlenecks transparent.

| Tier | Latency (per op) |
|---|---|
| Fetch | 0.1 – 0.5 μs |
| Compute | 0.4 – 2.0 μs |
| Search (brute) | 3 – 10 ms |

*(Next: FAISS indexing + GPU acceleration)*

Most systems (FAISS / Milvus / pgvector) only give you:

“query latency = X ms”

We give you:

memory latency → compute latency → retrieval latency

This is the **natural language language latency breakdown** needed for real‑time LLM grounding with llama.cpp.

Our public verification endpoint is currently live. You can test the latency decomposition directly from your own terminal right now:

```
curl "[https://5e776b15817fd1.lhr.life/query?mode=search&n=5000&k=3](https://5e776b15817fd1.lhr.life/query?mode=search&n=5000&k=3)"
```


