TurboQuant — Web Pulse coverage Accelerate LLM model loading and increase context windows with GPUDirect on Amazon FSx for Lustre and TurboQuant :: https://wpnews.pro/news/accelerate-llm-model-loading-and-increase-context-windows-with-gpudirect-on-fsx Speculative KV coding: losslessly compressing KV cache by up to ~4× using a predictor model :: https://wpnews.pro/news/speculative-kv-coding-losslessly-compressing-kv-cache-by-up-to-4x-using-a-model