cd/entity/vLLM· home entities vLLM
grep -l @vllm /news/*.json | wc -l → 154

vLLM

mentions 154 type Organization page 3/8 feed RSS

// recent coverage 154 mentions

11:47
2026-06-21
vettedconsumer.com
large-language-models

Show HN: Local LLM Hardware Calculator

A new Local LLM Hardware Calculator helps users estimate memory requirements for running large language models on their own hardware, factoring in weights, KV cache, and overhead. The tool also compar…

14:24
2026-06-20
news.ycombinator.com
ai-infrastructure

How to become an AI infrastructure engineer?

An AI infrastructure engineer at a major industrial company seeks advice on transitioning from SRE-focused work to a proper software engineering role in AI infrastructure, asking for skills, resources…

01:36
2026-06-20
dev.to
large-language-models

KV cache and PagedAttention: what they do and why they matter

A developer explains that the KV cache is the biggest operational bottleneck in production LLM serving on GPUs, consuming more memory than model weights for workloads with high concurrency or long con…

00:17
2026-06-20
modal.com
large-language-models

Speculation Is All You Need

Modal Labs released state-of-the-art DFlash speculators for Qwen 3.5 and Qwen 3.6 models on Hugging Face, achieving 5-20% additional speedups and enabling Qwen 3.5 122B-A10B to run at over 1000 tok/s …

15:00
2026-06-19
hiraditya.github.io
large-language-models

Building vLLM from Source: A Field Guide (with all the pitfalls)

A developer building vLLM from source on an AWS g5 instance with Ubuntu 26.04 and Python 3.14 encountered multiple version-skew, driver, and toolchain issues, including a pitfall where missing nvidia-…

10:44
2026-06-19
discuss.huggingface.co
large-language-models

Gemma 4 bug fixes and Research Request

A critical bug in Google's Gemma 4 causes it to malform tool calls under real load, affecting vLLM, llama.cpp, Ollama, and oobabooga. A developer open-sourced a diagnosis, repair, and experimental LoR…

00:00
2026-06-19
depot.dev
developer-tools

Now available: SOCI v2 support for Depot container builds

Depot now supports SOCI v2 for container builds, enabling lazy-pulling of images to drastically reduce startup times. The feature generates a SOCI index during the build process, allowing containers t…

18:18
2026-06-18
dev.to
developer-tools

IDE fixes, TS 5.9 beta, Claude tool use explained

The Continue plugin v1.2.20 patches memory leaks, unhandled exceptions, and JCEF message chunking crashes across JetBrains and VS Code adapters, fixing crash vectors that cause sidebar hangs and autoc…

16:29
2026-06-18
devashish.me
large-language-models

Two Qwen3 models on one DGX Spark: the residency math

Alibaba's Qwen3-80B and Qwen3-4B models were successfully co-located on a single NVIDIA DGX Spark using vLLM containers behind a LiteLLM proxy, but the 80B model's inability to emit tool calls in auto…

16:14
2026-06-18
dev.to
artificial-intelligence

7 Open-Source AI Projects Developers Need [June 2026]

Seven open-source AI projects—Ollama, Open WebUI, Browser Use, vLLM, Unsloth, CrewAI, and Continue—are reshaping production software development in June 2026. Ollama, with 174,000+ GitHub stars, now o…

10:16
2026-06-18
dev.to
large-language-models

What GLM-5.2 Changes for Long-Horizon Coding

Zhipu AI released GLM-5.2, a large language model with a 1M-token context window, flexible effort levels, and an MIT license, targeting long-horizon coding tasks. The model introduces IndexShare, an a…

09:00
2026-06-18
anyscale.com
large-language-models

High Performance Distributed Inference with Ray Serve LLM

Ray Serve LLM, in partnership with Google Kubernetes Engine, announced major performance improvements achieving up to 4.4x higher throughput on prefill-heavy workloads and 24x higher on decode-heavy w…

← prev page 3 / 8 next →
// co-occurs with top 8 entities