cd/entity/vLLM· home entities vLLM
grep -l @vllm /news/*.json | wc -l → 154

vLLM

mentions 154 type Organization page 2/8 feed RSS

// recent coverage 154 mentions

20:01
2026-06-26
pub.towardsai.net
large-language-models

GOSIM Paris: This Is What Open Source AI Looks Like in 2026

GOSIM Paris 2025, held at Station F on May 5-6, showcased open-source AI developments including LLMs advancing in mathematical reasoning, a call for transparency over speed, and the introduction of Ta…

13:10
2026-06-26
byteiota.com
ai-infrastructure

DGX Spark June 2026: Four Nodes, 700B Models Locally

NVIDIA's June 2026 DGX Spark update introduces automated four-node clustering via Cluster Assistant, enabling local inference of models up to 700B parameters. The update also delivers a 2.6x throughpu…

12:26
2026-06-26
3hcloud.com
ai-agents

How to Set Up and Deploy an OpenClaw AI Agent on a VPS

A new guide walks users through deploying an OpenClaw AI agent on a virtual private server, balancing cost, availability, and privacy. The tutorial covers server configuration, system requirements, an…

10:30
2026-06-26
aazar.me
large-language-models

Stop generating what you already have

A developer reduced LLM extraction latency from 42 seconds to 6 seconds by replacing verbatim text copying with pointer-based extraction and splitting a single large call into multiple parallel calls.…

20:42
2026-06-25
huggingface.co
ai-infrastructure

Run a vLLM Server on HF Jobs in One Command

Hugging Face launched a one-command method to run a vLLM server on its Jobs infrastructure, enabling users to quickly deploy models for testing, evaluation, or batch generation. The feature uses the o…

12:04
2026-06-25
devclubhouse.com
large-language-models

The Real Cost of the Open-Weight Price Collapse

The launch of Z.ai's GLM 5.2 and DeepSeek V4 Flash has created a 50x price gap between open-weight APIs and closed frontier models, reshaping the build-versus-buy calculus for developers. While open-w…

11:08
2026-06-25
flama.dev
large-language-models

LLM APIs with built-in chatbot in 1 line of code

Flama 2.0 introduces a CLI tool that allows users to download, package, and serve large language models from HuggingFace with a single command, including a built-in chat interface and production-ready…

10:01
2026-06-25
discuss.huggingface.co
large-language-models

Deepseek? Qwen?

A single H200 GPU with 141GB HBM3e cannot comfortably run DeepSeek V4 Flash (284B total, 13B active parameters) due to VRAM constraints, even with 2TB system RAM for offloading. The model requires an …

09:50
2026-06-25
oracomputing.com
large-language-models

ORA: Smaller Models. Same Intelligence

Ora Computing launched an automated LLM compression engine that reduces model size by up to 70% with minimal accuracy loss, enabling deployment on edge devices, on-prem servers, or cloud infrastructur…

20:51
2026-06-24
blog.crossplane.io
ai-infrastructure

I built a fleet-scale inference control plane using Crossplane

A developer built Modelplane, an open-source inference control plane using Crossplane, to manage GPU fleets across clouds, neoclouds, and on-premise environments. The platform allows platform teams to…

18:00
2026-06-23
research.ibm.com
artificial-intelligence

Running AI on mixed hardware for speed and affordability

IBM Research, Red Hat, and NxtGen Cloud Technologies demonstrated that using llm-d to serve AI models on mixed GPU hardware can boost inference speeds by 3 to 5 times and double throughput, enabling e…

← prev page 2 / 8 next →
// co-occurs with top 8 entities