cd /news/large-language-models/run-gemma-4-12b-on-wsl2-with-llama-c… · home topics large-language-models article
[ARTICLE · art-23104] src=dev.to pub= topic=large-language-models verified=true sentiment=· neutral

Run Gemma-4 12B on WSL2 with llama.cpp

A developer has published a guide for running Google's Gemma-4 12B instruction-tuned model on Windows Subsystem for Linux 2 (WSL2) using the llama.cpp framework. The process involves installing build tools, the NVIDIA CUDA toolkit for GPU acceleration, and compiling llama.cpp with CUDA support before loading the model from Hugging Face. The setup achieves approximately 19.5 tokens per second for prompt processing and 11.8 tokens per second for generation on compatible hardware.

read1 min publishedJun 6, 2026
sudo apt update && sudo apt upgrade -y

If you don't use -hf

option, you don't need to install libssl-dev in this step.

sudo apt install build-essential cmake git libssl-dev -y

If nvidia-smi

shows a GPU/GPUs on your terminal, you will need to install the tooklit. This will take some time.

sudo apt install nvidia-cuda-toolkit -y

Build llama-cli and llama-server. This step also will take some time.

If you don't plan to use -hf

option, you don't need to use -DLLAMA_OPENSSL=ON

.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DLLAMA_OPENSSL=ON
cmake --build build --config Release

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release

Run gemma-4-12b-it

with cli and server.

./build/bin/llama-cli -hf unsloth/gemma-4-12b-it-GGUF:UD-Q4_K_XL
> hello

[Start thinking]
The user said "hello".
The user is initiating a conversation.
Respond politely and offer assistance.

    *   "Hello! How can I help you today?"
    *   "Hi there! What's on your mind?"
    *   "Hello! Is there anything I can assist you with?"
[End thinking]

Hello! How can I help you today?

[ Prompt: 19.5 t/s | Generation: 11.8 t/s ]

or run web-ui

./build/bin/llama-server -hf unsloth/gemma-4-12b-it-GGUF:UD-Q4_K_XL --port 8080
mkdir -p models
wget -O models/gemma-4-12b-it-UD-Q4_K_XL.gguf https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/resolve/main/gemma-4-12b-it-UD-Q4_K_XL.gguf
── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/run-gemma-4-12b-on-w…] indexed:0 read:1min 2026-06-06 ·