DeepSpec is a full-stack codebase for training and evaluating draft models for speculative decoding. It contains data preparation utilities, draft model implementations, training code, and evaluation scripts.
Install the Python dependencies:
python -m pip install -r requirements.txt
Data preparation additionally requires an inference engine to serve the target model when regenerating answers; see scripts/data/README.md for details.
Run the stages in order β each stage's output feeds the next:
Data Preparationβ download prompts, regenerate target answers, and build the target cache.** Training**β train a draft model against the cached target outputs.** Evaluation**β measure speculative-decoding acceptance on benchmark tasks.
See scripts/data/README.md for the step-by-step data pipeline:
- download and split training data,
- regenerate answers,
- prepare the target cache (storage warning: this can be very large β roughly 38 TB for the default
Qwen/Qwen3-4B
setting).
bash scripts/train/train.sh
train.sh
launches train.py
, which spawns one worker per visible GPU. Select the algorithm and target model by pointing config_path
at one of the configs under config/ (e.g. config/dspark/dspark_qwen3_4b.py
); see the script header for the full list of configs, how to override config_path
/ target_cache_dir
, and how to use --opts
to override individual config fields. Checkpoints are written to ~/checkpoints/<project_name>/<exp_name>/step_*
.
Hardware: the default configs and scripts assume a single node with 8 GPUs. For fewer GPUs, reduce CUDA_VISIBLE_DEVICES
.
bash scripts/eval/eval.sh
eval.sh
runs eval.py
against a trained draft checkpoint over the speculative-decoding benchmarks in eval_datasets/ (gsm8k, math500, aime25, humaneval, mbpp, livecodebench, mt-bench, alpaca, arena-hard-v2). Set:
target_name_or_path
β the target model the draft was trained against (e.g.Qwen/Qwen3-4B
),draft_name_or_path
β the draft checkpoint, e.g.~/checkpoints/deepspec/dspark_block8_qwen3_4b/step_latest
.
Currently, DeepSpec includes three draft models: DSpark, DFlash and Eagle3.
DeepSpec is released under the MIT License. It includes code adapted from third-party projects under their own licenses; see NOTICE for the full attribution.
DeepSpec builds on the ideas and code of several excellent open-source projects:
SpecForge(Apache-2.0) β the overall training framework and Eagle3 implementation; portions of the Eagle3 modeling, loss, optimizer, attention, and evaluation code are adapted from it. Adapted files carry an in-file attribution comment, and the full notice is recorded inNOTICE.DFlash(MIT) β the DFlash draft-model design and training recipe.Qwen3andGemmaβ the target model families supported in this repo.
We thank the authors and maintainers of these projects. Contributions of new algorithms are welcome.