sudo apt update && sudo apt upgrade -y
If you don't use -hf
option, you don't need to install libssl-dev in this step.
sudo apt install build-essential cmake git libssl-dev -y
If nvidia-smi
shows a GPU/GPUs on your terminal, you will need to install the tooklit. This will take some time.
sudo apt install nvidia-cuda-toolkit -y
Build llama-cli and llama-server. This step also will take some time.
If you don't plan to use -hf
option, you don't need to use -DLLAMA_OPENSSL=ON
.
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DLLAMA_OPENSSL=ON
cmake --build build --config Release
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
Run gemma-4-12b-it
with cli and server.
./build/bin/llama-cli -hf unsloth/gemma-4-12b-it-GGUF:UD-Q4_K_XL
> hello
[Start thinking]
The user said "hello".
The user is initiating a conversation.
Respond politely and offer assistance.
* "Hello! How can I help you today?"
* "Hi there! What's on your mind?"
* "Hello! Is there anything I can assist you with?"
[End thinking]
Hello! How can I help you today?
[ Prompt: 19.5 t/s | Generation: 11.8 t/s ]
or run web-ui
./build/bin/llama-server -hf unsloth/gemma-4-12b-it-GGUF:UD-Q4_K_XL --port 8080
mkdir -p models
wget -O models/gemma-4-12b-it-UD-Q4_K_XL.gguf https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/resolve/main/gemma-4-12b-it-UD-Q4_K_XL.gguf