vLLM — Web Pulse coverage MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs :: https://wpnews.pro/news/maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host End-to-End Observability for vLLM and TGI: from DCGM to Tokens :: https://wpnews.pro/news/end-to-end-observability-for-vllm-and-tgi-from-dcgm-to-tokens Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding :: https://wpnews.pro/news/supercharging-llm-inference-on-google-tpus-achieving-3x-speedups-with-diffusion Ollama vs llama.cpp vs vLLM: Which Should You Use in 2026? :: https://wpnews.pro/news/ollama-vs-llama-cpp-vs-vllm-which-should-you-use-in-2026 vLLM and PyTorch Work Together to Improve the Developer Experience on aarch64 :: https://wpnews.pro/news/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64 vLLM V0 to V1: Correctness Before Corrections in RL :: https://wpnews.pro/news/vllm-v0-to-v1-correctness-before-corrections-in-rl SMG: The Case for Disaggregating CPU from GPU in LLM Serving :: https://wpnews.pro/news/smg-the-case-for-disaggregating-cpu-from-gpu-in-llm-serving gemma 4 chat template that works with opencode - download the .jinja file and tell vllm to use it via `--chat-template chat_template_gemma_large_fixed.jinja` :: https://wpnews.pro/news/gemma-4-chat-template-that-works-with-opencode-download-the-jinja-file-and-tell