cd /news/ai-chips/on1-g116-v8-38ms-black-box-ai-memory… · home topics ai-chips article
[ARTICLE · art-17744] src=github.com pub= topic=ai-chips verified=true sentiment=↑ positive

ON1 (G116 V8): 38μs Black-Box AI Memory Retrieval on Virtual Chip ISA

ON1 (G116 V8) has introduced a virtual chip ISA that achieves 38-microsecond black-box AI memory retrieval by separating vector search into three observable latency stages: fetch, compute, and ANN search. The system, designed for real-time LLM grounding with llama.cpp, exposes memory, compute, and retrieval latencies individually rather than reporting a single opaque query time. A public verification endpoint is currently live for testing the latency decomposition.

read1 min publishedMay 29, 2026

G116 v8: 38μs Black-box AI Memory Retrieval on Virtual Chip ISA (Latency-Separated Fetch/Compute/ANN) — Live Tunnel Inside

Unlike any conventional chip.G116 v8 introduces aquantum-inspired virtual ISAthat makes memory, compute, and ANN search latency observable – not just a single opaque query time.

Built for the next generation of LLMs (llama.cpp, real‑time RAG, natural language grounding).

G116 v8 decomposes vector retrieval into three hardware‑visible stages, just like a quantum memory fabric:

Fetch Layer– mmap‑based dataset mapping (zero‑copy, ~0.1–0.5 μs/op)** Compute Layer**– vector transformations (NumPy / BLAS, ~0.4–2 μs/op)** Search Layer**– ANN similarity (currently brute‑force, ~3–10 ms/op; FAISS/HNSW coming)

This is not another black‑box vector DB. It’s a virtual chip ISA that makes RAG bottlenecks transparent.

Tier Latency (per op)
Fetch 0.1 – 0.5 μs
Compute 0.4 – 2.0 μs
Search (brute) 3 – 10 ms

(Next: FAISS indexing + GPU acceleration)

Most systems (FAISS / Milvus / pgvector) only give you:

“query latency = X ms”

We give you:

memory latency → compute latency → retrieval latency

This is the natural language language latency breakdown needed for real‑time LLM grounding with llama.cpp.

Our public verification endpoint is currently live. You can test the latency decomposition directly from your own terminal right now:

curl "[https://5e776b15817fd1.lhr.life/query?mode=search&n=5000&k=3](https://5e776b15817fd1.lhr.life/query?mode=search&n=5000&k=3)"
── more in #ai-chips 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/on1-g116-v8-38ms-bla…] indexed:0 read:1min 2026-05-29 ·