Train LLM from Scratch

wpnews.pro

cd /news/large-language-models/train-llm-from-scratch · home › topics › large-language-models › article

[ARTICLE · art-35281] src=FareedKhan-dev.github.io ↗ pub=2026-06-21T03:30Z topic=large-language-models verified=true sentiment=↑ positive

Train LLM from Scratch

A developer trained a large language model from scratch using plain PyTorch, implementing the full post-training pipeline including SFT, reward modeling, DPO, PPO, and GRPO on public datasets, all runnable on a single GPU or scaled with DDP. The project emphasizes a modular design that wraps the base transformer without rewriting it, enabling instruction following and reasoning capabilities.

read3 min views1 publishedJun 21, 2026

When I first trained this transformer from scratch, it could continue text but it couldn't follow instructions or reason. That's what post-training fixes. This docs/

folder walks through the whole journey I built on top of the base model — every stage written from scratch in plain PyTorch (no trl

, no peft

, no transformers

), trained on real public datasets, and runnable on a single GPU or scaled across multiple GPUs with DDP.

If you are new to LLM training internals, start with the new ** LLM Foundations** section before reading the stage pages. It explains the token shapes, decoder-only Transformer, attention masks, objectives, optimization loop, and generation mechanics that every later page relies on.

Mermaid source (live, editable) #

flowchart TD
    PILE([The Pile<br/>9.8B tokens]):::data --> PRE{{Pretrain<br/>~400M base}}:::model
    PRE --> BASE[(base_pretrained.pt)]:::ckpt
    BASE --> SFT{{SFT<br/>Alpaca · Dolly · GSM8K}}:::model
    SFT --> SFTCK[(sft.pt)]:::ckpt
    SFTCK --> RM{{Reward Model<br/>Bradley-Terry}}:::rl
    SFTCK --> DPO{{DPO / ORPO / KTO<br/>preference}}:::rl
    RM --> RMCK[(reward.pt)]:::ckpt
    RMCK -->|reward signal| PPO{{PPO<br/>GAE + clip + KL}}:::rl
    SFTCK --> PPO
    SFTCK --> GRPO{{GRPO / RLVR<br/>group-relative}}:::rl
    PPO --> EVAL([GSM8K eval<br/>+ chat / inference]):::eval
    DPO --> EVAL
    GRPO --> EVAL
    classDef data fill:#d6ffd9,stroke:#27ae60,stroke-width:2px,color:#143d1a;
    classDef model fill:#ffe8a3,stroke:#d48806,stroke-width:2px,color:#5a3d00;
    classDef rl fill:#ffd9b3,stroke:#e67e22,stroke-width:2px,color:#6b3500;
    classDef ckpt fill:#eeeeee,stroke:#555555,stroke-width:2px,color:#222;
    classDef eval fill:#e8d6ff,stroke:#8e44ad,stroke-width:2px,color:#3d1a5a;

The stages, in order¶ #

#	Stage	What it teaches the model	Doc
1	Pretraining
language itself (next-token prediction on the Pile)

SFT<think>/<answer>

format03_sft.mdReward Model04_reward_model.md** DPO / ORPO / KTOwithoutan RL loop05_dpo.mdPPO06_ppo.md GRPO / RLVR**07_grpo.md** Data pipeline**01_data_pipeline.md** Evaluation**08_evaluation.md** Inference / chat**09_inference.md## The one design rule: wrap, don't rewrite¶

Everything here sits on top of the original Transformer. I changed the educational model in exactly

one place — I added a

method that returns the final hidden states the

forward_hidden

lm_head

consumes. Every post-training head (a value head for PPO, a scalar reward head for the reward model) and every RL log-prob computation composes aroundthat one method, so the from-scratch model you already understand stays intact.

Colour legend (used in every diagram in these docs)¶ #

🟩 data / corpus · 🟦 preprocessing · 🟦⬛ storage (HDF5 / JSONL) · 🟨 model / training loop · 🟧 RL / reward · 🟥 loss / objective · 🟪 evaluation · ⬜ checkpoint

Each diagram is a hand-drawn, colour-coded Mermaid sketch,

pre-rendered to a PNG and embedded as an image(GitHub's live Mermaid doesn't reliably dolook: handDrawn

, and some viewers — e.g. the VS Code preview — block SVGs, so an embedded PNG shows everywhere). The editable Mermaid source sits in a collapsible"Mermaid source"block under each image. To regenerate the images after editing, see[diagrams/README.md].

Run the whole thing¶ #

Once the base model has pretrained (02_pretraining.md), the entire chain is one script:

bash scripts/run_posttraining.sh          # SFT -> RM -> DPO -> PPO -> GRPO -> eval table

See POST_TRAINING.md for the condensed command reference.

source & further reading

FareedKhan-dev.github.io — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/train-llm-from-scratch

Read original on FareedKhan-dev.github.io → FareedKhan-dev.github.io/train-llm-from-scratch/

mentioned entities

PyTorch

The Pile

Alpaca

Dolly

GSM8K

Bradley-Terry

FareedKhan-dev

metadata

slugtrain-llm-from-scratch

topic#large-language-models

secondary4 topics

sentimentpositive

canonicalFareedKhan-dev.github.io

navigation

← prevYour AI feels slow? Maybe it's n…

next →I guess I should have become a P…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 21 Jun · #large-language-models

Your AI feels slow? Maybe it's not dumb—you're making it work one thing at a time

dev.to · 21 Jun · #large-language-models

Debugging AI Coding Agents: How to See Prompts, Tool Calls, Token Usage, and Cost

dev.to · 21 Jun · #large-language-models

perso — a WebAssembly policy engine that decides what your MCP agent is allowed to do

dev.to · 21 Jun · #large-language-models

Build Rails, Not Trains: A Framework for AI Infrastructure in the Global South

── more on @pytorch 3 stories trending now

wpnews · 20 Jun · #ai-safety

SR 11-7 Model Risk for AI Systems: What Banks Actually Need to Build

wpnews · 20 Jun · #ai-agents

Amazon Bedrock AgentCore Memory: Build AI Agents That Remember

wpnews · 20 Jun · #artificial-intelligence

Building a Voice AI Platform with 28 Modules in Python

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required

Train LLM from Scratch

Recommended reading order¶ #

Mermaid source (live, editable) #

The stages, in order¶ #

Colour legend (used in every diagram in these docs)¶ #

Run the whole thing¶ #

Run your AI side-project on zahid.host