Run Gemma-4 12B on WSL2 with llama.cpp

wpnews.pro

cd /news/large-language-models/run-gemma-4-12b-on-wsl2-with-llama-c… · home › topics › large-language-models › article

[ARTICLE · art-23104] src=dev.to ↗ pub=2026-06-06T03:22Z topic=large-language-models verified=true sentiment=· neutral

Run Gemma-4 12B on WSL2 with llama.cpp

A developer has published a guide for running Google's Gemma-4 12B instruction-tuned model on Windows Subsystem for Linux 2 (WSL2) using the llama.cpp framework. The process involves installing build tools, the NVIDIA CUDA toolkit for GPU acceleration, and compiling llama.cpp with CUDA support before loading the model from Hugging Face. The setup achieves approximately 19.5 tokens per second for prompt processing and 11.8 tokens per second for generation on compatible hardware.

read1 min views19 publishedJun 6, 2026

sudo apt update && sudo apt upgrade -y

If you don't use -hf

option, you don't need to install libssl-dev in this step.

sudo apt install build-essential cmake git libssl-dev -y

If nvidia-smi

shows a GPU/GPUs on your terminal, you will need to install the tooklit. This will take some time.

sudo apt install nvidia-cuda-toolkit -y

Build llama-cli and llama-server. This step also will take some time.

If you don't plan to use -hf

option, you don't need to use -DLLAMA_OPENSSL=ON

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DLLAMA_OPENSSL=ON
cmake --build build --config Release

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release

Run gemma-4-12b-it

with cli and server.

./build/bin/llama-cli -hf unsloth/gemma-4-12b-it-GGUF:UD-Q4_K_XL
> hello

[Start thinking]
The user said "hello".
The user is initiating a conversation.
Respond politely and offer assistance.

    *   "Hello! How can I help you today?"
    *   "Hi there! What's on your mind?"
    *   "Hello! Is there anything I can assist you with?"
[End thinking]

Hello! How can I help you today?

[ Prompt: 19.5 t/s | Generation: 11.8 t/s ]

or run web-ui

./build/bin/llama-server -hf unsloth/gemma-4-12b-it-GGUF:UD-Q4_K_XL --port 8080
mkdir -p models
wget -O models/gemma-4-12b-it-UD-Q4_K_XL.gguf https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/resolve/main/gemma-4-12b-it-UD-Q4_K_XL.gguf

source & further reading

dev.to — original article MCPMark v2: InsForge on Sonnet 4.6 InsForge vs Firebase: AI-Native Postgres Alternative InsForge vs Supabase: AI-Native Backend Alternative

~/api · this article 200

$curl api.wpnews.pro/v1/news/run-gemma-4-12b-on-wsl2-…

Read original on dev.to → dev.to/0xkoji/run-gemma-4-12b-on-wsl2-with-llama…

mentioned entities

Gemma-4

llama.cpp

WSL2

NVIDIA

CUDA

unsloth

Hugging Face

GGUF

metadata

slugrun-gemma-4-12b-on-wsl2-with-llama-cpp

topic#large-language-models

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevNVIDIA Launches RTX Spark Window…

next →Yoland Yan: Comfy UI revolutioni…

── more in #large-language-models 4 stories · sorted by recency

github.com · 22 Jul · #large-language-models

Bw24 – from scratch rust+CUDA inference, every kernel tuned for sm_120a

cryptobriefing.com · 22 Jul · #large-language-models

Google’s Gemini tech boosts ad relevance, conversion rates

qainsights.com · 22 Jul · #large-language-models

Mixture of Experts (MoE) Explained: How It Works with Simple Examples

cryptobriefing.com · 22 Jul · #large-language-models

Alphabet profits soar to $120B, driven by AI investments

── more on @gemma-4 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required