cd/entity/vLLM· home entities vLLM
grep -l @vllm /news/*.json | wc -l → 154

vLLM

mentions 154 type Organization page 5/8 feed RSS

// recent coverage 154 mentions

08:45
2026-06-16
thecomputersciencebook.com
large-language-models

PagedAttention is more than virtual memory

PagedAttention, a memory optimization technique in the vLLM inference server, applies virtual memory concepts to manage the KV cache in large language models, improving throughput by reducing fragment…

05:21
2026-06-16
letsdatascience.com
large-language-models

CacheWise Improves KVCache Reuse for LLM Coding Agents

Researchers introduced CacheWise, a KVCache management layer for LLM coding agents, reducing evictions by 2-2.6x and improving session completion time by up to 3.5x in vLLM, according to a June 2026 a…

21:59
2026-06-15
github.com
ai-agents

Show HN: Phlox – Open-source self-hosted agentic web chat

Phlox, an open-source self-hosted agentic web chat application, has been released on GitHub. It supports any model provider including AWS Bedrock and OpenAI-compatible endpoints, and features agentic …

04:43
2026-06-14
github.com
large-language-models

Forked TensorZero after it was archived after raising $7.3M

Agentify has forked the archived TensorZero project, which raised $7.3M, and released Agentify Gateway, an open-source LLM gateway with observability, evaluation, optimization, and experimentation fea…

01:13
2026-06-14
byteiota.com
large-language-models

DiffusionGemma: Google’s 4x Faster Text Diffusion Model

Google DeepMind released DiffusionGemma on June 10, 2026, a 26B open-weight text diffusion model that generates 256 tokens simultaneously, achieving up to 1,008 tokens per second on an H100—4-5x faste…

17:51
2026-06-12
testingcatalog.com
artificial-intelligence

MiniMax M3 launches on NVIDIA platform with Free Endpoint

MiniMax released its M3 multimodal model on NVIDIA's accelerated infrastructure, offering a free public endpoint via NVIDIA's API catalog. The 428-billion-parameter model processes text, images, and v…

17:20
2026-06-11
developers.googleblog.com
large-language-models

DiffusionGemma: The Developer Guide

Google has released DiffusionGemma, an experimental text-generation model built on the Gemma 4 architecture that generates text in parallel blocks rather than token-by-token, enabling faster inference…

← prev page 5 / 8 next →
// co-occurs with top 8 entities