Building vLLM from Source: A Field Guide (with all the pitfalls) A developer building vLLM from source on an AWS g5 instance with Ubuntu 26.04 and Python 3.14 encountered multiple version-skew, driver, and toolchain issues, including a pitfall where missing nvidia-smi falsely indicated no GPU. The field guide provides a working recipe and explains each step to avoid cryptic build failures. Building vLLM from Source: A Field Guide with all the pitfalls A step-by-step field guide to building vLLM from source on Ubuntu 26.04, covering Python 3.14 compatibility, CUDA driver issues, and toolchain pitfalls. Building vLLM 1 from source sounds like a pip install -e . away. In practice, on a fresh machine with a recent OS and a recent Python, you hit a chain of version-skew, driver, and toolchain issues that each fail with a cryptic message. This post walks through a real end-to-end build on an AWS g5 instance NVIDIA A10G running Ubuntu 26.04 + Python 3.14 , documenting every error encountered and the fix. The target was a CUDA build of a vLLM fork. The same playbook applies to a stock vllm-project/vllm checkout. TL;DR — the working recipe 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 1. Confirm you actually have a GPU see "Pitfall 1" — easy to get wrong lspci | grep -i nvidia hardware present? nvidia-smi driver working? 2. Driver if nvidia-smi fails but lspci shows the GPU sudo apt-get install -y nvidia-driver-575-open nvidia-modprobe dkms sudo modprobe -r nouveau && sudo modprobe nvidia or reboot 3. Virtual env python3 -m venv ~/go/venv && source ~/go/venv/bin/activate pip install --upgrade pip 4. CUDA torch + a CONSISTENT pip CUDA toolkit critical: one minor version pip install torch==2.11.0 torchvision==0.26.0 torchaudio==2.11.0 default index = CUDA build pip install "cuda-toolkit nvcc ==13.3.0" "nvidia-cuda-runtime==13.3.29" \ "nvidia-cuda-nvrtc==13.3.33" "nvidia-cublas==13.3.0.5" 5. Assemble CUDA HOME from the pip layout export CUDA HOME=$VIRTUAL ENV/lib/python3. /site-packages/nvidia/cu13 ln -sfn $CUDA HOME/lib $CUDA HOME/lib64 cd $CUDA HOME/lib && for f in lib .so. ; do ln -sf "$f" "${f%%.so. }.so"; done mkdir -p $CUDA HOME/lib/stubs ln -sf /usr/lib/x86 64-linux-gnu/libcuda.so $CUDA HOME/lib/stubs/libcuda.so 6. Build scope arch to YOUR GPU — A10G is sm 86 export PATH=$CUDA HOME/bin:$PATH CUDACXX=$CUDA HOME/bin/nvcc export VLLM TARGET DEVICE=cuda TORCH CUDA ARCH LIST="8.6+PTX" export MAX JOBS=12 NVCC THREADS=2 export CMAKE ARGS="-DCUDAToolkit ROOT=$CUDA HOME -DCMAKE CUDA COMPILER=$CUDA HOME/bin/nvcc" pip install -v -e . --no-build-isolation Read on for why each line is there and what breaks without it. Prerequisites & how to check them Before anything else, take an inventory. Getting this wrong wastes the most time — including the most embarrassing pitfall of all. | Requirement | How to check | Notes | |---|---|---| A GPU and which one | lspci \| grep -i nvidia | Determines CUDA vs CPU build. Don’t trust — see Pitfall 1. nvidia-smi alone | | GPU driver loaded | nvidia-smi | If it fails but lspci shows a GPU, the driver isn’t installed/loaded. | | Compute capability | nvidia-smi --query-gpu=compute cap --format=csv | A10G = 8.6 . You build kernels for this. | | CPU flags CPU build only | lscpu \| grep -oE 'avx512f\|avx2' | vLLM CPU wants AVX512; AVX2 works with limited features. | | Compiler | gcc --version | vLLM recommends gcc 12–13; newer 15 mostly works but watch nvcc host-compiler limits. | | Python | python3 --version | Check the repo’s requires-python in pyproject.toml . | | RAM / cores | nproc; free -h | CUDA compiles are RAM-hungry ~2–3 GB per parallel job . | | build tools | cmake --version; ninja --version | vLLM needs cmake ≥ 3.26. | Pitfall 1: “There’s no GPU here” — when there definitely is This one cost us a whole CPU build. The very first check was: 1 nvidia-smi → command not found Conclusion drawn: no GPU, do a CPU build. Wrong. nvidia-smi missing only means the driver/userspace tools aren’t installed — it says nothing about the hardware. The actual hardware check is: 1 2 bash $ lspci | grep -i nvidia 00:1e.0 3D controller: NVIDIA Corporation GA102GL A10G rev a1 The A10G was there the whole time; it just had no driver. Always check lspci or /proc/driver/nvidia, ls /dev/nvidia before concluding “no GPU.” On cloud instances that aren’t “Deep Learning AMIs,” a bare GPU with no driver is the norm, not the exception. Lesson: lspci detects hardware. nvidia-smi detects aworking driver. They answer different questions. Decide CPU-vs-GPU from lspci . Step 2: Install and load the NVIDIA driver lspci shows the GPU, nvidia-smi is missing → install the driver. 1 2 3 sudo apt-get update sudo apt-get install -y dkms build-essential linux-headers-$ uname -r \ nvidia-driver-575-open We used the open-kernel variant -open , which is NVIDIA’s recommendation for Ampere and newer A10G is Ampere . The 575 metapackage pulled driver 580.159.03 . Pitfall 2: modprobe nvidia → “No such device” nouveau owns the GPU 1 2 3 4 5 bash $ sudo modprobe nvidia modprobe: ERROR: could not insert 'nvidia': No such device $ dmesg | grep NVRM NVRM: GPU 0000:00:1e.0 is already bound to nouveau. The open-source nouveau driver grabs the GPU at boot. The NVIDIA module can’t bind while nouveau holds it. Fix — blacklist, unbind, and load: 1 2 3 4 5 6 echo -e "blacklist nouveau\noptions nouveau modeset=0" | \ sudo tee /etc/modprobe.d/blacklist-nouveau.conf echo -n "0000:00:1e.0" | sudo tee /sys/bus/pci/drivers/nouveau/unbind sudo rmmod nouveau sudo modprobe nvidia sudo update-initramfs -u make the blacklist survive reboots If rmmod nouveau complains it’s in use e.g. a display manager , a reboot after the blacklist + initramfs update achieves the same thing cleanly. Pitfall 3: nvidia-smi works but CUDA returns error 999 “unknown error” This is the subtle one. After loading the module: 1 2 3 python $ nvidia-smi works, shows the A10G $ python -c "import torch; print torch.cuda.is available " RuntimeError: CUDA unknown error ... False A direct driver-API probe confirmed the runtime was broken even though nvidia-smi was fine: 1 2 python import ctypes ctypes.CDLL "libcuda.so.1" .cuInit 0 → 999 CUDA ERROR UNKNOWN Two distinct causes, both worth knowing: Stale/incorrect UVM device nodes. nvidia-smi uses /dev/nvidia0 + /dev/nvidiactl major 195 . CUDA additionally needs /dev/nvidia-uvm . After a manual driver bring-up those nodes can be missing or have the wrong major. Recreate them against /proc/devices : 1 2 3 4 5 sudo modprobe nvidia uvm UVM MAJOR=$ grep nvidia-uvm /proc/devices | awk '{print $1}' sudo rm -f /dev/nvidia-uvm /dev/nvidia-uvm-tools sudo mknod -m 666 /dev/nvidia-uvm c $UVM MAJOR 0 sudo mknod -m 666 /dev/nvidia-uvm-tools c $UVM MAJOR 1 This setuid helper is what the CUDA runtime shells out to in order to create/initialize device nodes for non-root processes. Without it, raw nvidia-modprobe is not installed. cuInit may pass but torch’s runtime init throws 999 . This was the actual fix for us: 1 2 sudo apt-get install -y nvidia-modprobe sudo nvidia-modprobe -c 0 -u After this: torch.cuda.is available → True . A reboot also installs the proper udev rules and avoids the manual mknod dance — but if you can’t reboot, the two steps above get you there. Lesson: nvidia-smi working ≠ CUDA working. They use different device nodes. If cuInit returns 999, look at /dev/nvidia-uvm and make sure nvidia-modprobe exists. Step 3: The virtual environment Nothing exotic here, but keep it isolated from system Python: 1 2 3 python3 -m venv ~/go/venv source ~/go/venv/bin/activate pip install --upgrade pip We used Python 3.14 . Check the repo supports it: 1 2 grep requires-python pyproject.toml requires-python = " =3.10,<3.15" ✅ 3.14 allowed It built fine — torch==2.11.0 and every dependency had cp314 wheels. But see Pitfall 6: a bundled submodule had its own narrower Python check. Step 4: CUDA torch + a consistent CUDA toolkit vLLM compiles .cu kernels, so it needs nvcc — which PyTorch wheels do not bundle they ship runtime libraries only . You have two options: - Install the full CUDA toolkit to /usr/local/cuda via NVIDIA’s apt repo, or - Assemble a toolkit entirely from pip wheels. We went pip-only no apt repo for Ubuntu 26.04 yet, and it keeps everything in the venv . First, the CUDA build of torch: 1 2 pip install torch==2.11.0 torchvision==0.26.0 torchaudio==2.11.0 python -c "import torch; print torch.version.cuda " → 13.0 wheel tag: 2.11.0+cu130 Then nvcc and the dev components via the modern unified meta package: 1 pip install "cuda-toolkit nvcc ==13.3.0" Pitfall 4: the nvidia-cuda-nvcc-cu13 package is a stub The old naming is a trap: 1 2 bash $ pip install nvidia-cuda-nvcc-cu13 ERROR: ... from versions: 0.0.0a0, 0.0.1 placeholder only The real compiler ships via the cuda-toolkit nvcc extra which pulls nvidia-cuda-nvcc , nvidia-nvvm , nvidia-cuda-crt . Use the meta package’s extras, not the -cu13 standalone names. Pitfall 5: CUDA toolkit version skew three separate failures This was the single biggest time sink. The pip CUDA ecosystem is split across many packages nvidia-cuda-nvcc , nvidia-nvvm , nvidia-cuda-crt , nvidia-cuda-cccl , nvidia-cuda-runtime , nvidia-cublas , … and pip will happily install mismatched minor versions . Each mismatch fails differently: 5a. ptxas can’t assemble newer PTX: 1 ptxas fatal : Unsupported .version 9.3; current version is '9.0' nvcc front-end was 13.3 emits PTX 9.3 but ptxas was 13.0 ≤ PTX 9.0 . → align them. 5b. CMake refuses on nvcc-vs-headers mismatch PyTorch’s cuda.cmake : 1 2 CMake Error: FindCUDA says CUDA version is 13.3 from nvcc , but the CUDA headers say the version is 13.0. 5c. flashinfer’s bundled cccl refuses at runtime its JIT compiler : 1 2 cccl/.../cuda toolkit.h:41: error: "CUDA compiler and CUDA toolkit headers are incompatible, please check your include paths" The cccl check requires CUDART VERSION ’s minor to exactly equal nvcc’s minor. The fix for all three: pin the entire CUDA userspace to one minor version. Why 13.3 and not 13.0 to match torch’sBecause cu130 ?CUDA 13.0 headers don’t compile on glibc 2.43 Ubuntu 26.04 : 1 2 /usr/include/.../mathcalls.h:206: error: exception specification is incompatible with that of previous function "rsqrt" CUDA 13.1+ headers fixed this. So we align upto 13.3. torch built for cu130 still runs on a 13.3 runtime thanks toCUDA 13 minor-version compatibility any 13.x toolkit runs on an R580+ driver . 1 2 3 4 5 6 7 8 pip install "cuda-toolkit==13.3.0" "nvidia-cuda-runtime==13.3.29" \ "nvidia-cuda-nvcc==13.3.33" "nvidia-nvvm==13.3.33" \ "nvidia-cuda-crt==13.3.33" "nvidia-cuda-cccl==13.3.3.3.1" \ "nvidia-cuda-nvrtc==13.3.33" "nvidia-cublas==13.3.0.5" verify nvcc and headers agree: nvcc --version | grep release 13.3 grep CUDART VERSION $CUDA HOME/include/cuda runtime api.h 13030 = 13.3 pip prints a dependency-conflict warning torch pins cuda-toolkit==13.0.2 — it’s cosmetic; torch runs fine via minor-version compat. But beware: reinstalling vLLM later re-pulls its requirements/cuda.txt and silently downgrades the runtime back to 13.0 , breaking flashinfer’s JIT again. Re-run the 13.3 pins after any reinstall. Step 5: Assemble a working CUDA HOME The pip wheels lay CUDA out under .../site-packages/nvidia/cu13/{bin,include,lib} , which is almost what CMake and downstream linkers expect — but missing three things: 1 2 3 4 5 6 7 8 9 10 11 export CUDA HOME=$VIRTUAL ENV/lib/python3.14/site-packages/nvidia/cu13 a unversioned dev symlinks: wheels ship libcudart.so.13, linkers want libcudart.so cd $CUDA HOME/lib && for f in lib .so. ; do ln -sf "$f" "${f%%.so. }.so"; done b lib64 alias: some tools flashinfer JIT hardcode $CUDA HOME/lib64 ln -sfn $CUDA HOME/lib $CUDA HOME/lib64 c a libcuda stub for driver-API linking pip ships no stubs/ mkdir -p $CUDA HOME/lib/stubs ln -sf /usr/lib/x86 64-linux-gnu/libcuda.so $CUDA HOME/lib/stubs/libcuda.so Sanity check before the big build: 1 2 3 4 5 6 7 8 cat /tmp/t.cu <<'EOF' include