# Run Gemma-4 12B on WSL2 with llama.cpp

> Source: <https://dev.to/0xkoji/run-gemma-4-12b-on-wsl2-with-llamacpp-1o2m>
> Published: 2026-06-06 03:22:37+00:00



```
sudo apt update && sudo apt upgrade -y
```

If you don't use `-hf`

option, you don't need to install libssl-dev in this step.

```
sudo apt install build-essential cmake git libssl-dev -y
```

If `nvidia-smi`

shows a GPU/GPUs on your terminal, you will need to install the tooklit. This will take some time.

```
sudo apt install nvidia-cuda-toolkit -y
```

Build llama-cli and llama-server. This step also will take some time.

If you don't plan to use `-hf`

option, you don't need to use `-DLLAMA_OPENSSL=ON`

.

```
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DLLAMA_OPENSSL=ON
cmake --build build --config Release

# no GPU
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
```

Run `gemma-4-12b-it`

with cli and server.

```
./build/bin/llama-cli -hf unsloth/gemma-4-12b-it-GGUF:UD-Q4_K_XL
> hello

[Start thinking]
The user said "hello".
The user is initiating a conversation.
Respond politely and offer assistance.

    *   "Hello! How can I help you today?"
    *   "Hi there! What's on your mind?"
    *   "Hello! Is there anything I can assist you with?"
[End thinking]

Hello! How can I help you today?

[ Prompt: 19.5 t/s | Generation: 11.8 t/s ]
```

or run `web-ui`

```
./build/bin/llama-server -hf unsloth/gemma-4-12b-it-GGUF:UD-Q4_K_XL --port 8080
mkdir -p models
wget -O models/gemma-4-12b-it-UD-Q4_K_XL.gguf https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/resolve/main/gemma-4-12b-it-UD-Q4_K_XL.gguf
```


