{"slug": "foundation-vs-instruct-vs-chat-models-one-question-three-answers", "title": "Foundation vs. Instruct vs. Chat Models: One Question, Three Answers", "summary": "A developer demonstrated the differences between foundation, instruct, and chat models using Hugging Face's SmolLM2-135M family in a free Google Colab notebook. By asking the same question to three model variants, the tutorial shows that foundation models act as text completers, instruct models follow formatted instructions, and chat models use conversational context. The instruct and chat models share the same weights but differ in how prompts are structured.", "body_md": "*A hands-on tutorial you can run for free in Google Colab.*\n\nRun it yourself:open[in Google Colab and run every cell top to bottom. It uses the]`foundation_instruct_chat_tutorial.ipynb`\n\nSmolLM2-135Mfamily — small enough for a free CPU runtime, no GPU needed.\n\nPeople say \"LLM,\" \"GPT,\" \"an AI model,\" and \"ChatGPT\" as if they were the same thing. They aren't. There's a ladder of training stages between \"a model that read the internet\" and \"an assistant you can chat with,\" and the words **foundation**, **instruct**, and **chat** mark the rungs.\n\nThe cleanest way to feel the difference is to do something deliberately unfair: ask the **exact same question** to three versions of the **same model family** and watch how differently they behave. Our question is deliberately boring so the *behavior* stands out:\n\n\"What is the capital of France?\"\n\nWe use three checkpoints from Hugging Face's SmolLM2 family:\n\n| Model type | Hugging Face ID | One-line summary |\n|---|---|---|\n| Foundation (base) | `HuggingFaceTB/SmolLM2-135M` |\nPredicts the next token. Knows things, isn't helpful. |\n| Instruct | `HuggingFaceTB/SmolLM2-135M-Instruct` |\nFine-tuned to follow a single instruction. |\n| Chat |\n`HuggingFaceTB/SmolLM2-135M-Instruct` (used conversationally) |\nSame weights, driven through a multi-turn message list. |\n\nNotice that the chat row reuses the instruct checkpoint. That's not a shortcut — it's the honest reality, and we'll come back to why.\n\nA **foundation model** (also called a *base* or *pretrained* model) is trained on exactly one objective: given a stretch of text, **predict the next token**. Nothing else. It reads a huge slice of the internet and gets very good at continuing text in a statistically plausible way.\n\nWhat it is *never* taught is that a question deserves an answer. So when you feed it:\n\n```\nWhat is the capital of France?\n```\n\nit doesn't think *\"I should answer that.\"* It thinks *\"On the internet, what usually **comes after** a line like this?\"* And the answer is often… **more quiz questions**, a worksheet, or a tangent:\n\n```\nWhat is the capital of France? What is the capital of Germany? What is the\ncapital of Italy? ...\n```\n\nIn the notebook we pass the raw string straight into the pipeline with no formatting:\n\n```\nbase_pipe = pipeline(\"text-generation\", model=\"HuggingFaceTB/SmolLM2-135M\")\nbase_raw_out = base_pipe(test_query, max_new_tokens=30, do_sample=False)\nprint(base_raw_out[0]['generated_text'])\n```\n\n**Takeaway:** a foundation model is a **text completer**, not an assistant. It contains enormous knowledge but has no concept of being *helpful*. It's the raw clay everything else is shaped from.\n\nAn **instruct model** starts from that same base model and goes through a second stage of training — **fine-tuning on (instruction → response) pairs**. Thousands to millions of examples of the shape *\"Here's a request. Here's a good response.\"* This teaches the model a new contract: **when the user asks for something, actually do it and then stop.**\n\nBut there's a crucial detail people miss: an instruct model only behaves correctly when you wrap your text in the **exact special format it was trained on.** That format uses control tokens — for SmolLM2 they look like this:\n\n```\n<|im_start|>user\nWhat is the capital of France?<|im_end|>\n<|im_start|>assistant\n```\n\nYou don't type those tokens by hand. Every instruct model ships with a **chat template** baked into its tokenizer that builds them for you:\n\n```\ntokenizer = AutoTokenizer.from_pretrained(instruct_id)\nformatted_prompt = tokenizer.apply_chat_template(\n    [{\"role\": \"user\", \"content\": test_query}],\n    tokenize=False,\n    add_generation_prompt=True,  # appends the 'assistant' cue\n)\n```\n\nFeed *that* to the same-sized model and you get a clean, direct answer:\n\n```\nThe capital of France is Paris.\n```\n\nThe notebook prints the formatted prompt **before** generating, so you can literally see the hidden scaffolding the model receives. That \"aha\" — *oh, there's a whole structure under the hood* — is the most important thing in the tutorial.\n\n**Takeaway:** an instruct model = a base model **+ instruction tuning + a required prompt format**. Skip the format and even a well-trained instruct model can fall back to rambling.\n\nHere's the part that surprises people: a **chat model is usually the same weights as the instruct model.** The difference isn't *what* the model is — it's *how you drive it.*\n\nInstead of one instruction in, one response out, you maintain a **running list of role-tagged messages**:\n\n```\nchat_history = [\n    {\"role\": \"user\", \"content\": \"What is the capital of France?\"},\n]\nchat_out = chat_pipe(chat_history, max_new_tokens=30)\n```\n\nThe pipeline applies the chat template for you and returns the **whole conversation** with the assistant's reply appended. For a single turn, that looks identical to the instruct example. The magic only appears when the conversation **continues**.\n\nSo in the notebook we append the reply and ask a deliberately vague follow-up:\n\n```\nconversation = chat_out[0]['generated_text']        # user + assistant so far\nconversation.append({\"role\": \"user\",\n                     \"content\": \"And what is a famous landmark there?\"})\nfollow_up = chat_pipe(conversation, max_new_tokens=40)\n```\n\nThe word **\"there\"** is meaningless on its own. But because we passed the *entire history*, the model resolves \"there\" → **Paris** and names a landmark. That carried-over context is what turns a one-shot Q&A into something that feels like a conversation.\n\n**Takeaway:** a chat model is an instruct model **driven through a multi-turn message list**, so each new turn can use the previous turns as context. The system prompt, the `user`\n\n/`assistant`\n\nroles, and the growing history are the \"chat\" part.\n\n| Model | Trained to… | You give it… | Reply to \"What is the capital of France?\"\n|\n|---|---|---|---|\nFoundation |\ncontinue text | a raw string | echoes / continues the document — may never answer |\nInstruct |\nfollow one instruction | a chat-templated string | a direct answer: \"The capital of France is Paris.\"\n|\nChat |\nconverse over many turns | a list of messages | a direct answer + remembers context for follow-ups |\n\nRead top to bottom, it's a progression, not three unrelated things:\n\nWhen you talk to a commercial assistant, you're using stage 3, sitting on stage 2, built on stage 1.\n\nSmolLM2-135M is **tiny** — about 135 million parameters, versus the tens or hundreds of *billions* in frontier models. At this size the model will sometimes get a fact wrong, repeat itself, or trail off. **That's expected, and it's not the point.** The tutorial is designed to make the *behavioral* gap between the three modes visible on a free laptop or Colab CPU — not to win a trivia contest. The exact same three-stage structure scales all the way up to the largest models in production.\n\n`foundation_instruct_chat_tutorial.ipynb`\n\n`File → Open notebook → Upload`\n\n, or push it to GitHub and use the Colab badge).`Runtime → Run all`\n\n). The first run downloads the models — give it a minute.`test_query`\n\nto something open-ended like `\"Write a haiku about the sea.\"`\n\nand watch how the three modes diverge even more.`do_sample=True`\n\nwith `temperature=0.7`\n\nfor more varied, creative output.`HuggingFaceTB/SmolLM2-360M-Instruct`\n\nand feel the quality jump.Once you've *seen* the three behaviors with your own eyes, the vocabulary — base, instruct, chat, chat template, system prompt — stops being jargon and starts being obvious.\n\n*Happy experimenting!* 🚀", "url": "https://wpnews.pro/news/foundation-vs-instruct-vs-chat-models-one-question-three-answers", "canonical_source": "https://dev.to/vishalmysore/foundation-vs-instruct-vs-chat-models-one-question-three-answers-3gi", "published_at": "2026-06-16 23:08:32+00:00", "updated_at": "2026-06-16 23:21:16.079616+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "natural-language-processing"], "entities": ["Hugging Face", "SmolLM2-135M", "Google Colab", "HuggingFaceTB/SmolLM2-135M", "HuggingFaceTB/SmolLM2-135M-Instruct"], "alternates": {"html": "https://wpnews.pro/news/foundation-vs-instruct-vs-chat-models-one-question-three-answers", "markdown": "https://wpnews.pro/news/foundation-vs-instruct-vs-chat-models-one-question-three-answers.md", "text": "https://wpnews.pro/news/foundation-vs-instruct-vs-chat-models-one-question-three-answers.txt", "jsonld": "https://wpnews.pro/news/foundation-vs-instruct-vs-chat-models-one-question-three-answers.jsonld"}}