Train LLM from Scratch A developer trained a large language model from scratch using plain PyTorch, implementing the full post-training pipeline including SFT, reward modeling, DPO, PPO, and GRPO on public datasets, all runnable on a single GPU or scaled with DDP. The project emphasizes a modular design that wraps the base transformer without rewriting it, enabling instruction following and reasoning capabilities. Post-Training & Alignment — Overview ¶ post-training-alignment-overview When I first trained this transformer from scratch, it could continue text but it couldn't follow instructions or reason . That's what post-training fixes. This docs/ folder walks through the whole journey I built on top of the base model — every stage written from scratch in plain PyTorch no trl , no peft , no transformers , trained on real public datasets, and runnable on a single GPU or scaled across multiple GPUs with DDP. If you are new to LLM training internals, start with the new LLM Foundations section before reading the stage pages. It explains the token shapes, decoder-only Transformer, attention masks, objectives, optimization loop, and generation mechanics that every later page relies on. Recommended reading order ¶ recommended-reading-order Foundations first : Tokenization foundations/tokenization/ - Transformer foundations/transformer/ - Attention foundations/attention/ - Objectives foundations/objectives/ - Optimization foundations/optimization/ - Generation foundations/generation/ . Then the full pipeline : Data 01 data pipeline/ - Pretraining 02 pretraining/ - SFT 03 sft/ - Reward Model 04 reward model/ - DPO 05 dpo/ - PPO 06 ppo/ - GRPO 07 grpo/ . Finally run and inspect : Evaluation 08 evaluation/ , Inference / Chat 09 inference/ , and the command cheatsheet howto/commands/ . The pipeline mirrors how modern aligned/reasoning models are actually built: Mermaid source live, editable php flowchart TD PILE The Pile