RadixAttention

mentions 2 type Organization feed RSS

// recent coverage 2 mentions

00:00

2026-07-15

dibi8.com

artificial-intelligence

SGLang — Structured Generation and Fast LLM Serving Engine

SGLang, an open-source LLM inference engine, introduces RadixAttention for prefix caching and grammar-constrained decoding to achieve 25x throughput improvement over vLLM for structured output tasks. …

08:21

2026-06-03

letsdatascience.com

large-language-models

Trellis Introduces RadixAttention KV Prefix Cache

Trellis introduced RadixAttention, a radix-tree-based KV cache designed to accelerate the prefill phase of LLM inference for chat and agentic sessions. The system stores shared string prefixes compact…

// co-occurs with top 6 entities

Trellis 1 SGLang 1 vLLM 1 FlashAttention-3 1 Llama-3.2-8B-Instruct 1 Llama-3.2-70B-Instruct 1