Diverse reasoning traces teach LLMs to make better decisions Researchers at Amazon have developed a method to train large language models to generate multiple, diverse reasoning paths for the same problem, improving their decision-making accuracy. The team introduced global forking tokens and a set-supervised fine-tuning approach to prevent the models from collapsing into a single reasoning pattern, achieving 5% to 7% gains in single-shot accuracy on standard benchmarks. This advancement addresses the limitation of traditional supervised fine-tuning, which trains models on a single reasoning trace per question, by enabling models to learn and select from distinct reasoning strategies. Large language models LLMs are pretrained on huge volumes of unlabeled data, but afterward, they’re typically post-trained on specific tasks such as instruction following, avoiding harmful outputs, and reasoning , or providing justifications for the outputs they generate. Parallel reasoning — in which multiple, diverse reasoning paths are generated and compared for the same problem — is emerging as a key tool for understanding the limits of LLMs’ reasoning capability. It also underpins techniques for testing LLMs such as self-consistency, where multiple reasoning paths are aggregated to improve accuracy. LLMs are generally optimized for reasoning through supervised fine-tuning SFT , in which each training example is labeled with a single, human-verified reasoning trace. Given the usefulness of parallel reasoning for evaluation, the question naturally arises, Can we expand the limits of LLMs’ reasoning capacities by training them on diverse reasoning traces for each question? In a paper https://www.amazon.science/publications/training-large-language-models-to-reason-in-parallel-with-global-forking-tokens we presented at this year’s International Conference on Learning Representations ICLR https://www.amazon.science/conferences-and-events/iclr-2026 , we propose a method for doing just that, which avoids some previously identified pitfalls of parallel reasoning. To prompt a single LLM to adopt different reasoning strategies, we introduce a set of global forking tokens such as