cd/entity/vLLM· home entities vLLM
grep -l @vllm /news/*.json | wc -l → 154

vLLM

mentions 154 type Organization page 1/8 feed RSS

// recent coverage 154 mentions

20:44
2026-07-03
tigera.io
ai-agents

Six AI agent SDKs for enterprise Kubernetes, compared

Six AI agent SDKs—LangGraph, CrewAI, Google ADK, and others—are compared for enterprise Kubernetes deployment, with most being model-agnostic and containerizable for on-premise use, though Anthropic's…

09:07
2026-07-01
glukhov.org
large-language-models

Speculative Decoding: 20-50% Faster LLM Inference

Speculative decoding accelerates large language model inference by 20-50% without quality loss, using a draft-verify mechanism that generates multiple tokens per forward pass. The technique amortizes …

03:09
2026-07-01
byteiota.com
large-language-models

MiniMax M3: Open-Weight Model That Beats GPT-5.5 on Coding

MiniMax released M3, a 428-billion-parameter open-weight model, on June 7, achieving 59.0% on SWE-Bench Pro—slightly outperforming GPT-5.5's 58.6%—at $0.30 per million input tokens, making it 16 times…

20:04
2026-06-30
letsdatascience.com
large-language-models

Article Compares Continuous and Static Batching in LLM Inference

A new article compares continuous batching and static batching in LLM inference, explaining how techniques in vLLM and TGI improve throughput and reduce latency. The choice of batching strategy affect…

00:00
2026-06-30
jasonrobert.dev
artificial-intelligence

News Summary for June 30, 2026

Agentic AI systems are maturing from prototypes into production-grade infrastructure, with vLLM's Micro-Agent framework demonstrating that serving-layer orchestration can match or beat frontier models…

00:00
2026-06-30
aclanthology.org
artificial-intelligence

CUHKSZ Simultaneous Speech Translation System for IWSLT 2026

The CUHKSZ team submitted a simultaneous speech translation system to IWSLT 2026, built on Qwen3-Omni-30B-A3B with LoRA adaptation, achieving 40.5 BLEU for English→Chinese and 27.7 BLEU for English→Ge…

00:46
2026-06-28
github.com
artificial-intelligence

AMD Strix Halo RDMA Cluster Setup Guide

AMD Strix Halo cluster setup guide details how to configure a two-node system linked via Intel E810 RoCE v2 for distributed vLLM inference using Tensor Parallelism. The guide covers hardware prerequis…

15:27
2026-06-27
cefboud.com
large-language-models

Distributed LLM Inference with LLM-d

A new open-source tool called llm-d acts as an LLM-aware load balancer for distributed inference, intelligently routing requests across vLLM instances based on KV cache locality and GPU utilization. B…

10:10
2026-06-27
dev.to
ai-agents

DeerFlow 2.0 Review: ByteDance's Open SuperAgent Harness

ByteDance open-sourced DeerFlow 2.0, a long-horizon agent runtime that orchestrates sub-agents, sandboxes, persistent memory, and an extensible skill system. The project reached 74,960 GitHub stars an…

08:06
2026-06-27
github.com
ai-tools

Show HN: Brytlog – AI logger

Developer released Brytlog, an open-source AI logger that replaces raw terminal output with concise AI summaries to save developers time and money. The tool acts as a pre-processor for agentic workflo…

22:35
2026-06-26
cmart.blog
large-language-models

Inference Cards

A new plaintext markup format called Inference Cards aims to standardize how self-hosted LLM performance claims are communicated, requiring details like model variant, quantization, hardware, inferenc…

page 1 / 8 next →
// co-occurs with top 8 entities