15:27
2026-06-27
cefboud.com
large-language-models
Distributed LLM Inference with LLM-d
A new open-source tool called llm-d acts as an LLM-aware load balancer for distributed inference, intelligently routing requests across vLLM instances based on KV cache locality and GPU utilization. Bโฆ