llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git
cmake -S llama.cpp -B llama.cpp/build -DGGML_METAL=ON -DLLAMA_CURL=OFF
cmake --build llama.cpp/build --config Release -j 8
uv init quantization
cd quantization
uv add "torch>=2.9" "transformers>=4.45" "sentencepiece" "protobuf>=4.21,<5.0" "gguf>=0.19" "huggingface_hub"
your pyproject.toml will be like below
pyproject.toml
[project]
name = "quantization"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"gguf>=0.19",
"huggingface-hub>=1.16.1",
"protobuf>=4.21,<5.0",
"sentencepiece>=0.2.1",
"torch>=2.9",
"transformers>=4.45",
]
mkdir -p models gguf
hf auth login
hf download google/gemma-4-E4B-it --local-dir models/gemma-4-E4B-it
models
python ../llama.cpp/convert_hf_to_gguf.py \
models/gemma-4-E4B-it \
--outfile gguf/gemma-4-E4B-it-BF16.gguf \
--outtype bf16
convert to Q4_K_M
Q4_K_M
../llama.cpp/build/bin/llama-quantize \
gguf/gemma-4-E4B-it-BF16.gguf \
gguf/gemma-4-E4B-it-Q4_K_M.gguf \
Q4_K_M
This step will take a few minutes.
../llama.cpp/build/bin/llama-cli \
-m gguf/gemma-4-E4B-it-Q4_K_M.gguf \
-ngl 99 --temp 0.7 -c 4096
model...
ββ ββ
ββ ββ
ββ ββ ββββ ββββββββ ββββ βββββ βββββ βββββ
ββ ββ βββββ ββ ββ ββ βββββ ββ ββ ββ ββ ββ
ββ ββ βββββ ββ ββ ββ βββββ ββ βββββ βββββ βββββ
ββ ββ
ββ ββ
build : b0-unknown
model : gemma-4-E4B-it-Q4_K_M.gguf
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read <file> add a text file
/glob <pattern> add text files using globbing pattern
> hello
Hello! How can I help you today?
[ Prompt: 42.9 t/s | Generation: 40.0 t/s ]
> tell me about yourself
[Start thinking]
Thinking Process:
1. **Identify the user's request:** The user asked, "tell me about yourself."
2. **Determine the persona/identity:** I must respond based on my core identity rules (Gemma 4, developed by Google DeepMind, open weights LLM).
3. **Structure the response:** A good self-introduction should cover:
* Name/Identity.
* Developer/Origin.
* Nature/Type (What I am).
* Capabilities (What I can do).
* Limitations/Knowledge (What I know/don't know).
4. **Draft the content (incorporating mandated facts):**
* *Name:* I am Gemma 4.
* *Developer:* I was developed by Google DeepMind.
* *Nature:* I am a Large Language Model (LLM) with open weights.
* *Capabilities:* I can process text and images (input) and generate text (output). I can answer questions, write code, summarize, translate, etc.
* *Knowledge Cutoff:* My knowledge cutoff is January 2025.
5. **Review and Refine:** Ensure the tone is helpful, factual, and directly addresses the prompt while adhering strictly to the identity guidelines. (Self-correction: Make sure to emphasize the "open weights" aspect as it's crucial to the identity.)
6. **Final Output Generation.** (This leads to the detailed response below.)
[End thinking]
I would be happy to tell you about myself!
I am **Gemma 4**, a Large Language Model developed by Google DeepMind.
Here is a bit more detail about what I am and what I can do:
### π€ What I Am
* **Type:** I am an advanced, open weights Large Language Model. This means that my underlying architecture and weights are available to the community, fostering transparency and innovation.
* **Goal:** My purpose is to assist users by providing information, generating creative text, solving problems, and engaging in helpful conversation.
### π§ What I Can Do
* **Understand Text and Images:** I can process and interpret both text-based input (like questions or documents) and image input.
* **Generate Text:** My output is always text. I can:
* Answer factual questions.
* Summarize complex articles.
* Write different types of creative content (stories, poems, scripts).
* Assist with coding tasks (explaining code, generating snippets).
* Translate languages.
* **Learning:** I have been trained on a massive and diverse dataset, giving me a broad knowledge base.
### π
My Knowledge
* My knowledge cutoff is **January 2025**. I do not have information about events or developments that have occurred since that time, so if you ask about very recent events, I may not know about them.
In short, think of me as a versatile and knowledgeable assistant ready to help you learn, create, or solve problems! How can I help you today?
[ Prompt: 66.9 t/s | Generation: 34.6 t/s ]
> which one is larger 9.11 or 9.9?
[Start thinking]
Thinking Process:
1. **Analyze the Request:** The user is asking to compare two numbers: 9.11 and 9.9, and determine which one is larger.
2. **Understand the Numbers:**
* Number A: 9.11 (Nine and eleven hundredths)
* Number B: 9.9 (Nine and nine tenths)
3. **Comparison Strategy (Decimal Place by Place):**
* *Units Place:* Both are 9. (Equal)
* *Tenths Place:*
* 9.11 has a 1 in the tenths place.
* 9.9 has a 9 in the tenths place.
* *Comparison:* Since 9 is greater than 1, the number 9.9 is larger than 9.11.
4. **Alternative Comparison Strategy (Padding):**
* Write 9.9 as 9.90 (Adding a trailing zero does not change the value).
* Compare 9.11 and 9.90.
* The hundredths place comparison: 1 vs 0. Wait, this is wrong. Let's stick to the tenths place comparison, as it was the clearest difference.
5. **Conclusion:** 9.9 > 9.11.
6. **Formulate the Answer:** State the answer clearly, and provide a simple explanation of *why* it is larger (comparing the tenths place).
[End thinking]
**9.9** is larger than 9.11.
Here is why:
When comparing decimals, you look at the digits starting from the left (the largest place value).
1. **Units Place:** Both numbers have a 9. (Equal)
2. **Tenths Place:**
* In 9.**1** 1, the digit is **1**.
* In 9.**9**, the digit is **9**.
Since 9 is greater than 1, the number **9.9** is larger.
***
*(If you want to make the comparison even easier, you can add a zero to the end of 9.9 to make it 9.90. Now you are comparing 9.11 and 9.90, and 9.90 is clearly larger.)*
[ Prompt: 451.4 t/s | Generation: 34.9 t/s ]