17:15
2026-06-16
github.com
large-language-models
Sors: a Rust proxy that reorders prompts to maximize vLLM prefix cache hits
A new Rust-based reverse proxy called Sors reorders prompt content to maximize prefix cache hits in LLM inference engines like vLLM and SGLang, improving latency by placing static content before dynamβ¦