Sors: a Rust proxy that reorders prompts to maximize vLLM prefix cache hits A new Rust-based reverse proxy called Sors reorders prompt content to maximize prefix cache hits in LLM inference engines like vLLM and SGLang, improving latency by placing static content before dynamic elements. The proxy supports tag-based and auto-detect modes, and benchmarks show significant speedups for cached prompts. A minimal reverse proxy that reorders prompt content to maximize prefix cache hits in LLM inference engines vLLM, SGLang, or any OpenAI-compatible backend with prefix caching enabled . vLLM's Automatic Prefix Caching uses a radix tree keyed on sequential tokens from position 0. If volatile content timestamps, request IDs appears before a large static block, the entire downstream prefix is invalidated every request. sors intercepts API requests, classifies prompt blocks as static/dynamic/unknown, and reorders them to place stable content at the prefix position — maximizing cache reuse. | Mode | Trigger | Mechanism | |---|---|---| Tag-based |