{"slug": "i-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch", "title": "I Fine-Tuned a 270M Model on My Laptop (Full Fine-Tuning, From Scratch)", "summary": "A developer fully fine-tuned a 270M-parameter Gemma 3 model on a laptop using the Banking77 dataset, achieving ~96% accuracy on intent classification. The project used full fine-tuning with loss masking to focus only on label tokens, and the training config used a low learning rate of 5e-5 to preserve pretrained knowledge.", "body_md": "I wanted to actually *understand* fine-tuning — not run a tutorial and nod along. So I gave myself a constraint: **same task, three techniques, smallest model to largest.** Full fine-tuning, then LoRA, then QLoRA. Hold the task fixed and the only variable is the method.\n\nThis first post is full fine-tuning — the most powerful and most expensive option: **update every weight in the model.**\n\n[Banking77](https://huggingface.co/datasets/mteb/banking77): ~13,000 real bank customer-support messages, 77 intents like `card_arrival`\n\n, `lost_or_stolen_card`\n\n, `exchange_rate`\n\n. The model reads a message and names the intent.\n\nI picked **Gemma 3, 270M parameters** — small enough to fully fine-tune on a laptop (Apple Silicon / MPS). That's intentional: full fine-tuning stores gradients and optimizer state for *every* parameter, roughly 4× the model's size in memory. I wanted to *feel* that, not read about it.\n\nThe obvious approach is to bolt a 77-way classification head onto the model. I didn't. Instead I had the model **generate the intent as text** — literally output `card_arrival`\n\n. Why? Because that's the same shape as instruction-tuning, so the later LoRA/QLoRA projects build naturally on this one.\n\nThe key detail is masking the loss so the model is graded *only* on the label tokens, not the prompt:\n\n```\n# build \"prompt + label\", but set prompt tokens to -100 so the loss ignores them\nprompt_ids = tokenizer(prompt, add_special_tokens=False)[\"input_ids\"]\ntarget_ids = tokenizer(\" \" + label_name + tokenizer.eos_token,\n                       add_special_tokens=False)[\"input_ids\"]\ninput_ids = prompt_ids + target_ids\nlabels    = [-100] * len(prompt_ids) + target_ids   # only the label is graded\n```\n\nIf you skip that masking, the model spends its capacity learning to reproduce the prompt instead of the answer.\n\nBecause you're updating *all* the pretrained weights, a too-high learning rate shreds the model's existing knowledge. I used 5e-5 and it trained cleanly. Bumping to 2e-4 destabilized it. The training config is otherwise unremarkable — and that's the point:\n\n```\nTrainingArguments(\n    num_train_epochs=3,\n    per_device_train_batch_size=16,\n    learning_rate=5e-5,            # small, on purpose\n    lr_scheduler_type=\"cosine\",\n    bf16=False, fp16=False,        # fp32 on MPS for stability\n)\n```\n\n(The later projects *freeze* the base, which is exactly why they can tolerate a much higher learning rate — there's no fragile pretrained knowledge to wreck.)\n\n~96% on the common intents. A near-perfect diagonal confusion matrix. A 270M model, fully fine-tuned on a laptop, nailing the task.\n\nThe one persistent slip: it confused ** card_arrival** with\n\n`card_delivery_estimate`\n\nIn [Part 2](https://dev.to/sumanpro/lora-i-trained-1-of-a-15b-model-and-matched-a-full-fine-tune-41if), I take a model 5× bigger and train less than 1% of it — and get the same accuracy. That's LoRA.\n\n📓 **Full runnable notebook on Kaggle:** [https://www.kaggle.com/code/sumannath88/01-full-finetune-gemma270m](https://www.kaggle.com/code/sumannath88/01-full-finetune-gemma270m)\n\n*Built with PyTorch + Hugging Face Transformers. Questions or corrections welcome in the comments.*", "url": "https://wpnews.pro/news/i-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch", "canonical_source": "https://dev.to/sumanpro/i-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch-3p4l", "published_at": "2026-06-21 12:08:45+00:00", "updated_at": "2026-06-21 12:36:51.299285+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "natural-language-processing", "developer-tools"], "entities": ["Gemma 3", "Banking77", "Hugging Face", "PyTorch", "Kaggle", "LoRA", "QLoRA"], "alternates": {"html": "https://wpnews.pro/news/i-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch", "markdown": "https://wpnews.pro/news/i-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch.md", "text": "https://wpnews.pro/news/i-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch.txt", "jsonld": "https://wpnews.pro/news/i-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch.jsonld"}}