Run Gemma-4 E2B-it with llama.cpp on Raspberry Pi4 A developer successfully ran Google's Gemma-4 E2B-it large language model on a Raspberry Pi 4 using llama.cpp, achieving text generation speeds of 1.5 to 1.8 tokens per second. The project involved converting the model to GGUF format and compiling llama.cpp with Clang and ARM NEON optimizations for the ARM-based single-board computer. Tested Gemma-4 E2B-it on Raspberry Pi 4. the way to convert Gemma-4 E2B-it to gguf models https://huggingface.co/baxin/gemma-4-E4B-it-E2B-it-Q4 K M https://huggingface.co/baxin/gemma-4-E4B-it-E2B-it-Q4 K M LLM inference in C/C++ -hf are now stored in the standard Hugging Face cache directory, enabling sharing with other HF tools. gpt-oss model with native MXFP4 format has been added | llama-server : git clone https://github.com/ggml-org/llama.cpp.git cd llama.cpp cmake -B build -DCMAKE BUILD TYPE=Release cmake --build build --config Release the command was run from llama.cpp folder and gemma-4-E2B-it-Q4 K M.gguf is placed in models folder. folder structure llama.cpp models ./build/bin/llama-cli -m ../models/gemma-4-E2B-it-Q4 K M.gguf -t 4 -tb 4 -c 2048 -fa auto --prio 3 -p "hello" ▄▄ ▄▄ ██ ██ ██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄ ██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██ ██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀ ██ ██ ▀▀ ▀▀ build : b9425-0821c5fcf model : gemma-4-E2B-it-Q4 K M.gguf modalities : text available commands: /exit or Ctrl+C stop or exit /regen regenerate the last response /clear clear the chat history /read