{"slug": "a-tiny-local-model-can-sort-tickets", "title": "A tiny local model can sort tickets", "summary": "Developer Torgeir Helgevold fine-tuned a 600-million-parameter local LLM (Qwen 3:0.6B) to classify household questions into metadata categories, achieving 92% accuracy on a test set—up from 10% with prompting alone. The fine-tuned model, which runs on a laptop, demonstrates that small local models can reliably handle narrow classification tasks like ticket routing, offering cost, privacy, and reliability advantages over cloud-based frontier models.", "body_md": "The developer Torgeir Helgevold runs a chatbot that answers questions about his house — who cleaned the gutters, which painter did the downstairs, when the pool pump was last replaced. The bot pulls answers from a vector database, but first classifies each question into a metadata category (pool, car, hvac, cooking, gutters) and narrows the search to just that category’s entries. The classification step is the part that broke.\n\nThe chatbot uses two local models: Qwen 3:4B for general question answering, and Qwen 3:0.6B — a 600-million-parameter model small enough to run on a laptop — for categorisation. The whole question is whether that tiny model can be fine-tuned into a reliable classifier. The hypothesis Helgevold set out to test, in [his write-up](https://www.teachmecoolstuff.com/viewarticle/fine-tuning-a-local-llm-to-categorize-questions): a very small local LLM can be fine-tuned to perform reliable question categorization when trained on a dataset of household-related questions\n\n.\n\n## The numbers\n\nThe baseline — the same 0.6B model used straight from the box, with a careful prompt — scored 13 out of 131 on a held-out test set. That is 10%. The model kept inventing categories that were not on its list (one answer came back as Ollama returned an unknown category name “apartments” from response “apartments”\n\n) and over-using broad labels like *electric* and *appliances*.\n\nFine-tune number one, using Unsloth with QLoRA on about 850 entries split 70/15/15, lifted the score to 104 out of 131 (79%). Fine-tune number two — the same data, the same method, but with each category swapped for an opaque two-letter code (AA, BB, CC, and so on) — reached 120 out of 131.\n\n10% → 92% on the same 131-question test set: prompt-only baseline, then a first fine-tune, then a second fine-tune with opaque codes — all on a 600-million-parameter model small enough to run on a laptop.\n\n## Why this is the lesson for a UK small firm\n\nThe lesson is not “Qwen 3:0.6B is the best classifier ever”. It is that for narrow, repetitive classification — routing support tickets, tagging inbound emails, sorting enquiries by department, screening job applications, flagging supplier invoices — prompting a big cloud model is the wrong tool. A fine-tuned local model scored 92% on a job a 600M-parameter model had no business doing; a frontier model would also score well, but only after per-call fees, a third-party API and whatever latency and rate limits come with the plan.\n\nA local fine-tune flips three of those dials:\n\n**Cost.** After the one-off training run, inference is electricity on hardware you already own. A 0.6B-parameter model runs on a small office server or a spare laptop.**Privacy.** Customer messages, supplier names, contract details never leave the building. That is the line you put in front of a sceptical partner or DPO.**Reliability.** No API rate limits, no surprise billing, no model-version drift mid-quarter.\n\nThe toolkit is cheap to try. Unsloth is a free, open-source fine-tuning library; QLoRA is the parameter-efficient method that lets a 600M-parameter fine-tune fit on a single modest GPU; and the dataset required is “a few hundred labelled examples”, not the tens of thousands the folklore suggests. The author’s own tip: It’s been my experience that it’s more important to come up with a good dataset than worrying about tweaking the Unsloth values too much, at least to start.\n\n## The wrinkle worth knowing\n\nThe most interesting finding is buried in the middle of the post. The first fine-tune taught the model the readable category names (appliances, brick work, cooking, …) and got 79%. Helgevold suspected the model was getting confused by semantically overlapping labels — water-related ones especially, where *pool*, *water heater* and *fountain* share a root concept. The fix was not more data and not better hyperparameters. It was replacing the readable labels with fixed, non-overlapping two-letter codes. The accuracy jumped to 92%. His reading of it: It appears that asking for fixed, non-overlapping output helps the tiny qwen model when generating responses.\n\nThe wider point is that fine-tuning is partly a labelling problem. If you can give a tiny model a closed, non-overlapping set of targets to choose from, it does the rest. Readable labels look nice in a CSV; in a model’s mouth, they invite ambiguity.\n\n## How to try it this week\n\nFor a UK firm with a repetitive classification job — a shared inbox, a help-desk queue, a daily flow of supplier invoices or job applications — the path is shorter than the folklore suggests.\n\n**Pull together a few hundred labelled examples.** A CSV of*question*,*label*is enough. Quality matters more than quantity: spend an afternoon curating. Include the awkward cases.**Pick a tiny base model that runs locally.** Qwen 3:0.6B is the obvious candidate; any sub-1B open-weights model follows the same playbook.**Use Unsloth with QLoRA.** The notebooks run on free cloud GPUs (Colab or Kaggle) and walk through the full path from dataset to exported model.**Replace readable labels with opaque codes if you have semantic overlap.** Test both. Codes win when readable labels share a root concept.**Export and ship locally.** Unsloth exports to a runtime such as Ollama, which runs on a small server or laptop with no further setup.\n\nThe cost of finding out is one afternoon and a free-tier GPU; the upside is a classifier that runs on kit you own, never phones home, and never sends a usage bill.\n\n## Sources & quotes\n\nEvery quotation in this article is verbatim from a named source — click any\n1 to see where it came from. It's part of how we\nkeep an AI-run newsroom honest. [How we verify →](/blog/how-we-keep-an-ai-newsroom-honest/)", "url": "https://wpnews.pro/news/a-tiny-local-model-can-sort-tickets", "canonical_source": "https://www.runagentrun.co.uk/articles/tiny-local-model-cheap-classifier/", "published_at": "2026-06-22 00:00:00+00:00", "updated_at": "2026-06-24 01:00:26.439022+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "ai-research"], "entities": ["Torgeir Helgevold", "Qwen 3:0.6B", "Unsloth", "QLoRA"], "alternates": {"html": "https://wpnews.pro/news/a-tiny-local-model-can-sort-tickets", "markdown": "https://wpnews.pro/news/a-tiny-local-model-can-sort-tickets.md", "text": "https://wpnews.pro/news/a-tiny-local-model-can-sort-tickets.txt", "jsonld": "https://wpnews.pro/news/a-tiny-local-model-can-sort-tickets.jsonld"}}