I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

wpnews.pro

cd /news/machine-learning/i-fine-tuned-a-270m-model-on-my-lapt… · home › topics › machine-learning › article

[ARTICLE · art-35562] src=dev.to ↗ pub=2026-06-21T12:08Z topic=machine-learning verified=true sentiment=↑ positive

I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)

A developer fully fine-tuned a 270M-parameter Gemma 3 model on a laptop using the Banking77 dataset, achieving ~96% accuracy on intent classification. The project used full fine-tuning with loss masking to focus only on label tokens, and the training config used a low learning rate of 5e-5 to preserve pretrained knowledge.

read2 min views1 publishedJun 21, 2026

I wanted to actually understand fine-tuning — not run a tutorial and nod along. So I gave myself a constraint: same task, three techniques, smallest model to largest. Full fine-tuning, then LoRA, then QLoRA. Hold the task fixed and the only variable is the method.

This first post is full fine-tuning — the most powerful and most expensive option: update every weight in the model.

Banking77: ~13,000 real bank customer-support messages, 77 intents like card_arrival

, lost_or_stolen_card

, exchange_rate

. The model reads a message and names the intent.

I picked Gemma 3, 270M parameters — small enough to fully fine-tune on a laptop (Apple Silicon / MPS). That's intentional: full fine-tuning stores gradients and optimizer state for every parameter, roughly 4× the model's size in memory. I wanted to feel that, not read about it.

The obvious approach is to bolt a 77-way classification head onto the model. I didn't. Instead I had the model generate the intent as text — literally output card_arrival

. Why? Because that's the same shape as instruction-tuning, so the later LoRA/QLoRA projects build naturally on this one.

The key detail is masking the loss so the model is graded only on the label tokens, not the prompt:

prompt_ids = tokenizer(prompt, add_special_tokens=False)["input_ids"]
target_ids = tokenizer(" " + label_name + tokenizer.eos_token,
                       add_special_tokens=False)["input_ids"]
input_ids = prompt_ids + target_ids
labels    = [-100] * len(prompt_ids) + target_ids   # only the label is graded

If you skip that masking, the model spends its capacity learning to reproduce the prompt instead of the answer.

Because you're updating all the pretrained weights, a too-high learning rate shreds the model's existing knowledge. I used 5e-5 and it trained cleanly. Bumping to 2e-4 destabilized it. The training config is otherwise unremarkable — and that's the point:

TrainingArguments(
    num_train_epochs=3,
    per_device_train_batch_size=16,
    learning_rate=5e-5,            # small, on purpose
    lr_scheduler_type="cosine",
    bf16=False, fp16=False,        # fp32 on MPS for stability
)

(The later projects freeze the base, which is exactly why they can tolerate a much higher learning rate — there's no fragile pretrained knowledge to wreck.)

~96% on the common intents. A near-perfect diagonal confusion matrix. A 270M model, fully fine-tuned on a laptop, nailing the task.

The one persistent slip: it confused ** card_arrival** with

card_delivery_estimate

In Part 2, I take a model 5× bigger and train less than 1% of it — and get the same accuracy. That's LoRA.

📓 Full runnable notebook on Kaggle: https://www.kaggle.com/code/sumannath88/01-full-finetune-gemma270m

Built with PyTorch + Hugging Face Transformers. Questions or corrections welcome in the comments.

source & further reading

dev.to — original article why a simple string match beat apple's nlembedding for local rag QLoRA: Fine-Tuning a 7B Model on a 16GB GPU (It Shrank to 5.4GB in Front of Me) Building a sub-millisecond LLM security proxy in Go — lessons from 62 adversarial vectors

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-fine-tuned-a-270m-mode…

Read original on dev.to → dev.to/sumanpro/i-fine-tuned-a-270m-model-on-my-…

mentioned entities

Gemma 3

Banking77

Hugging Face

PyTorch

Kaggle

LoRA

QLoRA

metadata

slugi-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch

topic#machine-learning

secondary3 topics

sentimentpositive

canonicaldev.to

navigation

← prevTech Workers Invest Nights Learn…

next →Noam Shazeer Joins OpenAI: Googl…

── more in #machine-learning 4 stories · sorted by recency

dev.to · 21 Jun · #machine-learning

QLoRA: Fine-Tuning a 7B Model on a 16GB GPU (It Shrank to 5.4GB in Front of Me)

github.com · 20 Jun · #machine-learning

Show HN: Alloy – a PyTorch backend and inference engine for Apple Silicon

dev.to · 21 Jun · #machine-learning

why a simple string match beat apple's nlembedding for local rag

byteiota.com · 21 Jun · #machine-learning

MiniMax M3: What Developers Need to Know Before Deploying It

── more on @gemma 3 3 stories trending now

wpnews · 20 Jun · #ai-agents

Amazon Bedrock AgentCore Memory: Build AI Agents That Remember

wpnews · 20 Jun · #artificial-intelligence

Microsoft is rewriting the economics of enterprise AI and the bill shock is just getting started

wpnews · 20 Jun · #artificial-intelligence

Big Tech redirects buybacks into AI capital spending

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required