cd/sources/ranvier-auto-discoveredΒ· homeβ€Ί sourcesβ€Ί Ranvier (auto-discovered)
cat /sources/ranvier-auto-discovered.feed | wc -l β†’ 2

Ranvier (auto-discovered)

articles 2 domain ranvier.systems β†’ feed RSS
00:20
2026-05-26
ranvier.systems
large-language-models

Tokenization Is the Bottleneck You're Not Measuring

A hidden bottleneck in LLM proxy architectures is causing 5-13 millisecond blocking delays per request during tokenization, a CPU-bound operation that most systems treat as instantaneous. In event-loo…

00:00
2026-04-30
ranvier.systems
large-language-models

KV Cache Locality: The Hidden Variable in Your LLM Serving Cost

A 22% throughput improvement and up to 97.5% cache hit rate is achievable on LLM serving clusters by routing requests to GPUs that already hold their token prefixes in KV cache, rather than using roun…