{"slug": "on1-g116-v8-38ms-black-box-ai-memory-retrieval-on-virtual-chip-isa", "title": "ON1 (G116 V8): 38μs Black-Box AI Memory Retrieval on Virtual Chip ISA", "summary": "ON1 (G116 V8) has introduced a virtual chip ISA that achieves 38-microsecond black-box AI memory retrieval by separating vector search into three observable latency stages: fetch, compute, and ANN search. The system, designed for real-time LLM grounding with llama.cpp, exposes memory, compute, and retrieval latencies individually rather than reporting a single opaque query time. A public verification endpoint is currently live for testing the latency decomposition.", "body_md": "G116 v8: 38μs Black-box AI Memory Retrieval on Virtual Chip ISA (Latency-Separated Fetch/Compute/ANN) — Live Tunnel Inside\n\nUnlike any conventional chip.G116 v8 introduces aquantum-inspired virtual ISAthat makes memory, compute, and ANN search latency observable – not just a single opaque query time.\n\nBuilt for the next generation of LLMs (llama.cpp, real‑time RAG, natural language grounding).\n\nG116 v8 decomposes vector retrieval into three hardware‑visible stages, just like a quantum memory fabric:\n\n**Fetch Layer**– mmap‑based dataset mapping (zero‑copy, ~0.1–0.5 μs/op)** Compute Layer**– vector transformations (NumPy / BLAS, ~0.4–2 μs/op)** Search Layer**– ANN similarity (currently brute‑force, ~3–10 ms/op; FAISS/HNSW coming)\n\nThis is **not** another black‑box vector DB. It’s a *virtual chip ISA* that makes RAG bottlenecks transparent.\n\n| Tier | Latency (per op) |\n|---|---|\n| Fetch | 0.1 – 0.5 μs |\n| Compute | 0.4 – 2.0 μs |\n| Search (brute) | 3 – 10 ms |\n\n*(Next: FAISS indexing + GPU acceleration)*\n\nMost systems (FAISS / Milvus / pgvector) only give you:\n\n“query latency = X ms”\n\nWe give you:\n\nmemory latency → compute latency → retrieval latency\n\nThis is the **natural language language latency breakdown** needed for real‑time LLM grounding with llama.cpp.\n\nOur public verification endpoint is currently live. You can test the latency decomposition directly from your own terminal right now:\n\n```\ncurl \"[https://5e776b15817fd1.lhr.life/query?mode=search&n=5000&k=3](https://5e776b15817fd1.lhr.life/query?mode=search&n=5000&k=3)\"\n```\n\n", "url": "https://wpnews.pro/news/on1-g116-v8-38ms-black-box-ai-memory-retrieval-on-virtual-chip-isa", "canonical_source": "https://github.com/ON1-Hao/ON1", "published_at": "2026-05-29 15:00:09+00:00", "updated_at": "2026-05-29 15:17:33.439483+00:00", "lang": "en", "topics": ["ai-chips", "ai-infrastructure", "artificial-intelligence", "machine-learning", "large-language-models"], "entities": ["G116 v8", "llama.cpp", "FAISS", "Milvus", "pgvector", "NumPy", "BLAS", "HNSW"], "alternates": {"html": "https://wpnews.pro/news/on1-g116-v8-38ms-black-box-ai-memory-retrieval-on-virtual-chip-isa", "markdown": "https://wpnews.pro/news/on1-g116-v8-38ms-black-box-ai-memory-retrieval-on-virtual-chip-isa.md", "text": "https://wpnews.pro/news/on1-g116-v8-38ms-black-box-ai-memory-retrieval-on-virtual-chip-isa.txt", "jsonld": "https://wpnews.pro/news/on1-g116-v8-38ms-black-box-ai-memory-retrieval-on-virtual-chip-isa.jsonld"}}