14:10
2026-06-30
github.com
large-language-models
EdgeSync-LLM β KV cache fragment engine for on-device LLM inference (Go/Android)
EdgeSync-LLM, a new KV cache fragment engine for on-device LLM inference, stores and retrieves transformer KV tensors via HNSW approximate nearest-neighbor search, enabling exact hits at ~8ms TTFT andβ¦