Run a vLLM Server on HF Jobs in One Command Hugging Face launched a one-command method to run a vLLM server on its Jobs infrastructure, enabling users to quickly deploy models for testing, evaluation, or batch generation. The feature uses the official vLLM Docker image, exposes a public proxy URL gated by Hugging Face tokens, and bills per minute of hardware usage. Run a vLLM Server on HF Jobs in One Command Update on GitHub https://github.com/huggingface/blog/blob/main/vllm-jobs.md It's the quickest way to stand up a model for tests, evals, or batch generation. If you're after a managed, production-ready service instead, that's what Inference Endpoints https://huggingface.co/docs/inference-endpoints are for β€” more on when to pick which hf-jobs-or-inference-endpoints at the end. Here's the whole thing end to end. Prerequisites - A payment method or a positive prepaid credit balance Jobs is billed per‑minute by hardware usage . huggingface hub = 1.20.0 : pip install -U "huggingface hub =1.20.0" .- Logged in locally: hf auth login . Launch the server hf jobs run is docker run for HF infrastructure. We use the official vllm/vllm-openai image, ask for a GPU with --flavor , and expose vLLM's port with --expose : hf jobs run --flavor a10g-large --expose 8000 --timeout 2h \ vllm/vllm-openai:latest \ vllm serve Qwen/Qwen3-4B --host 0.0.0.0 --port 8000 --expose 8000 routes the container's port through HF's public jobs proxy see the Serve Models guide https://huggingface.co/docs/hub/jobs-serving for the full reference . The command prints the URL your server is reachable at: βœ“ Job started id: 6a381ca1953ed90bfb947332 url: https://huggingface.co/jobs/qgallouedec/6a381ca1953ed90bfb947332 Hint: Exposed ports are reachable at requires an HF token with read access to the job : https://6a381ca1953ed90bfb947332--8000.hf.jobs 6a381ca1953ed90bfb947332 is your job ID. Keep track of it, we'll need it. We'll use