Open Reproduction of DeepSeek-R1 Hugging Face has released Open R1, an open-source reproduction of DeepSeek's R1 reasoning model, completing the first of three planned steps. The project includes a 350k-trace reasoning dataset distilled from DeepSeek-R1 and the OpenR1-Distill-7B model that matches the performance of DeepSeek's distilled 7B model. The initiative aims to make the full R1 pipeline reproducible and accessible to the broader AI community. A fully open reproduction of DeepSeek-R1. This repo is a work in progress, let's build it together Table of Contents Overview overview Plan of attack plan-of-attack Installation installation Training models training-models Evaluating models evaluating-models Reproducing Deepseek's evaluation results reproducing-deepseeks-evaluation-results Data generation data-generation Contributing contributing The goal of this repo is to build the missing pieces of the R1 pipeline such that everybody can reproduce and build on top of it. The project is simple by design and mostly consists of: src/open r1 : contains the scripts to train models as well as generate synthetic data: grpo.py : trains a model with GRPO on a given dataset. sft.py : performs a simple SFT of a model on a dataset. generate.py : generates synthetic data from a model using Distilabel https://github.com/argilla-io/distilabel . Makefile : contains easy-to-run commands for each step in the R1 pipeline leveraging the scripts above. We will use the DeepSeek-R1 tech report https://github.com/deepseek-ai/DeepSeek-R1 as a guide, which can roughly be broken down into three main steps: - Step 1: replicate the R1-Distill models by distilling a high-quality corpus from DeepSeek-R1. - Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will likely involve curating new, large-scale datasets for math, reasoning, and code. - Step 3: show we can go from base model to RL-tuned via multi-stage training. 🧑🍳 2025/05/26 Step 1 completed We release--a curated reasoning dataset of 350k verified traces distilled from R1. The dataset spans tasks in mathematics, coding, and science, and is designed to teach language models to reason step-by-step. We also provide a recipe to train Mixture-of-Thoughts OpenR1-Distill-7B https://huggingface.co/open-r1/OpenR1-Distill-7B , which replicates the reasoning capabilities of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B and marks the completion of step 1 in the Open R1 project. ⚡️ 2025/03/11 We release the update 3 https://huggingface.co/blog/open-r1/update-3 :dataset of 10k competitive programming problems and 100k solutions distilled from R1. We also release IOI24: a new benchmark of CodeForces-CoTs very hard problems from international olympiads. A 7B Qwen model trained on CodeForces-CoTs can outperform Claude 3.7 Sonnet on IOI24, while a 32B model can outperform R1 itself. ∞ 2025/02/10 We release the update 2 https://huggingface.co/blog/open-r1/update-2 :dataset of 220k traces distilled from R1 on a new version of NuminaMath. Models trained on this dataset match the performance of DeepSeek's distilled ones. OpenR1-Math-220k 🔥 2025/02/02 We implement the first parts of the update 1 https://huggingface.co/blog/open-r1/update-1 : training https://github.com/huggingface/open-r1?tab=readme-ov-file training-models , inference https://github.com/huggingface/open-r1?tab=readme-ov-file data-generation , and evaluation https://github.com/huggingface/open-r1?tab=readme-ov-file reproducing-deepseeks-evaluation-results pipelines. Let's go Caution Libraries rely on CUDA 12.4. If you see errors related to segmentation faults, double check the version your system is running with nvcc --version . To run the code in this project, first, create a Python virtual environment using e.g. uv . To install uv , follow the UV Installation Guide https://docs.astral.sh/uv/getting-started/installation/ . Note As a shortcut, run make install to setup development libraries spelled out below . Afterwards, if everything is setup correctly you can try out the Open-R1 models. uv venv openr1 --python 3.11 && source openr1/bin/activate && uv pip install --upgrade pip Tip For Hugging Face cluster users, add export UV LINK MODE=copy to your .bashrc to suppress cache warnings from uv Next, install vLLM and FlashAttention: uv pip install vllm==0.8.5.post1 uv pip install setuptools && uv pip install flash-attn --no-build-isolation This will also install PyTorch v2.6.0 and it is very important to use this version since the vLLM binaries are compiled for it. You can then install the remaining dependencies for your specific use case via pip install -e . LIST OF MODES . For most contributors, we recommend: GIT LFS SKIP SMUDGE=1 uv pip install -e ". dev " Next, log into your Hugging Face and Weights and Biases accounts as follows: huggingface-cli login wandb login Finally, check whether your system has Git LFS installed so that you can load and push models/datasets to the Hugging Face Hub: git-lfs --version If it isn't installed, run: sudo apt-get install git-lfs Note The training commands below are configured for a node of 8 x H100s 80GB . For different hardware and topologies, you may need to tune the batch size and number of gradient accumulation steps. We support training models with either DDP or DeepSpeed ZeRO-2 and ZeRO-3 . For example, to perform SFT on a dataset distilled from DeepSeek-R1 with reasoning traces such as open-r1/Mixture-of-Thoughts https://huggingface.co/datasets/open-r1/Mixture-of-Thoughts , run: Train via command line accelerate launch --config file=recipes/accelerate configs/zero3.yaml src/open r1/sft.py \ --model name or path open-r1/Qwen2.5-Math-7B-RoPE-300k \ --dataset name open-r1/Mixture-of-Thoughts \ --dataset config all \ --eos token '<|im end| ' \ --learning rate 4.0e-5 \ --num train epochs 5 \ --max seq length 32768 \ --per device train batch size 2 \ --gradient checkpointing \ --bf16 \ --use liger kernel \ --output dir data/OpenR1-Distill-7B Train via YAML config accelerate launch --config file recipes/accelerate configs/zero3.yaml src/open r1/sft.py \ --config recipes/OpenR1-Distill-7B/sft/config distill.yaml Currently, the following tasks are supported: - Supervised Fine-Tuning sft - Group Relative Policy Optimization grpo Tip If you scale up/down the number of GPUs, we recommend also scaling up the per-device batch size or number of gradient accumulation steps to keep the global batch size constant. By default, these scripts will push each model to your Hugging Face Hub username, i.e. {username}/{model name}-{task} . You can override the parameters in each YAML config by appending them to the command as follows: Change the base model to a smaller variant accelerate launch --config file recipes/accelerate configs/zero3.yaml src/open r1/sft.py \ --config recipes/OpenR1-Distill-7B/sft/config distill.yaml \ --model name or path Qwen/Qwen3-0.6B-Base \ --hub model id OpenR1-Distill-0.6B \ --output dir data/OpenR1-Distill-0.6B If you also wish to override the Weights and Biases default settings, you can do so as follows: accelerate launch --config file recipes/accelerate configs/zero3.yaml src/open r1/sft.py \ --config recipes/OpenR1-Distill-7B/sft/config distill.yaml --wandb entity huggingface --wandb project open-r1 --run name Qwen2.5-1.5B-GRPO 🚨 WARNING 🚨 Most base models like meta-llama/Llama-3.2-1B do not have a chat template, so we set ChatML as the default during training. However, for Qwen base models like Qwen/Qwen2.5-1.5B , a chat template is pre-defined in the tokenizer, so the EOS token must be set accordingly, e.g. Align EOS token with chat template for Qwen base models accelerate launch --config file=recipes/accelerate configs/zero3.yaml src/open r1/sft.py \ --model name or path Qwen/Qwen2.5-1.5B \ + --eos token '<|im end| ' --dataset name open-r1/Mixture-of-Thoughts \ --dataset config all \ --learning rate 4.0e-5 \ --num train epochs 1 \ --max seq length 32768 \ --per device train batch size 16 \ --gradient checkpointing \ --bf16 \ --use liger kernel \ --output dir data/Qwen2.5-1.5B-Open-R1-Distill If you wish to use a custom chat template e.g. Llama or Gemma , then the chat template and associated EOS token must be provided: Align EOS token with custom chat template accelerate launch --config file=recipes/accelerate configs/zero3.yaml src/open r1/sft.py \ --model name or path meta-llama/Llama-3.2-1B \ + --chat template "$ cat llama chat template.jinja " \ + --eos token '<|eot id| ' \ --dataset name open-r1/Mixture-of-Thoughts \ --dataset config all \ --learning rate 4.0e-5 \ --num train epochs 1 \ --max seq length 32768 \ --per device train batch size 16 \ --gradient checkpointing \ --bf16 \ --use liger kernel \ --output dir data/Llama-3.2-1B-Open-R1-Distill We provide a recipe to reproduce the reasoning capabilities of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B , starting from the same base model. To do so, run: ACCELERATE LOG LEVEL=info accelerate launch --config file recipes/accelerate configs/zero3.yaml \ src/open r1/sft.py \ --config recipes/OpenR1-Distill-7B/sft/config distill.yaml The result will be a model like open-r1/OpenR1-Distill-7B https://huggingface.co/open-r1/OpenR1-Distill-7B , with the following downstream performance: | Model | AIME 2024 | MATH-500 | GPQA Diamond | LiveCodeBench v5 | |---|---|---|---|---| | OpenR1-Distill-7B | 52.7 | 89.0 | 52.8 | 39.4 | | DeepSeek-R1-Distill-Qwen-7B | 51.3 | 93.5 | 52.4 | 37.4 | You can adjust the YAML config to train on a different base model or dataset. We use TRL's vLLM backend https://huggingface.co/docs/trl/speeding up training?vllm+examples=GRPO vllm-for-fast-generation-in-online-methods to scale training to large models across multiple nodes. For single-node training of smol models across 8 GPUs, use vllm mode="colocate" to run vLLM in the same process as the training script: ACCELERATE LOG LEVEL=info \ accelerate launch --config file recipes/accelerate configs/zero3.yaml \ src/open r1/grpo.py --config recipes/DeepSeek-R1-Distill-Qwen-1.5B/grpo/config demo.yaml \ --vllm mode colocate Warning The chat template used in the distilled DeepSeek models omits the contents of the reasoning block within the