{"slug": "maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host", "title": "MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs", "summary": "MaxText has introduced new post-training capabilities, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), now available on single-host TPU configurations like v5p-8 and v6e-8. These features, built with JAX and the Tunix library, allow developers to adapt pre-trained models for specialized tasks or complex reasoning, such as math and coding, with minimal setup. The workflows are designed to scale seamlessly from single-host to multi-host configurations for larger models and datasets.", "body_md": "In the rapidly evolving landscape of large language models (LLMs), pre-training is only the first step. To transform a base model into a specialized assistant or a high-performing reasoning engine, post-training is essential. Today, we are excited to announce new features in [MaxText](https://github.com/AI-Hypercomputer/maxtext) that streamline this process: **Supervised Fine-Tuning (SFT)** and**Reinforcement Learning (RL)** now available on single-host TPU configurations (such as v5p-8 and v6e-8).\n\nBy leveraging the power of JAX and the efficiency of the [Tunix](https://github.com/google/tunix/tree/main) library, MaxText provides a high-performance, scalable path for developers to refine their models using the latest post-training techniques. You can explore the full documentation for [SFT](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/tutorials/posttraining/sft.html) and [RL](https://maxtext.readthedocs.io/en/maxtext-v0.2.1/tutorials/posttraining/rl.html) to start your post-training journey on TPUs today.\n\nSupervised Fine-Tuning is the primary method for adapting a pre-trained model to follow specific instructions or excel at niche tasks. With the new single-host SFT support, users can now take an existing MaxText or Hugging Face checkpoint and fine-tune it on labeled datasets with minimal setup.**Key Highlights:** For tasks requiring complex logic and reasoning—such as math or coding—Reinforcement Learning is a game-changer. MaxText now supports several state-of-the-art RL algorithms on single-host TPUs, utilizing**vLLM** for high-throughput inference during the training loop. For example,\n\nTo begin using these new features, ensure you have the latest post-training dependencies installed:\n\n```\nuv pip install maxtext[tpu-post-train]==0.2.1 --resolution=lowest\ninstall_maxtext_tpu_post_train_extra_deps\n```\n\nYou can launch an SFT run using the train_sft module, specifying your model, dataset, and output directory:\n\n```\npython3 -m maxtext.trainers.post_train.sft.train_sft \\\n   model_name=${MODEL?} \\\n   load_parameters_path=${MAXTEXT_CKPT_PATH?} \\\n   run_name=${RUN_NAME?} \\\n   base_output_directory=${BASE_OUTPUT_DIRECTORY?}\n```\n\nFor RL, the train_rl module handles the loading of policy and reference models, executes the training, and provides automated evaluation on reasoning benchmarks:\n\n```\npython3 -m maxtext.trainers.post_train.rl.train_rl \\\n  model_name=${MODEL?} \\\n  load_parameters_path=${MAXTEXT_CKPT_PATH?} \\\n  run_name=${RUN_NAME?} \\\n  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \\\n  loss_algo=gspo-token \\\n  chips_per_vm=${CHIPS_PER_VM?}\n```\n\nWhile single-host support provides a powerful entry point for many developers, MaxText is built for scale. These same workflows are designed to transition seamlessly to multi-host configurations for those training larger models and utilizing massive datasets. Please stay tuned for more updates in this direction from us in the future.", "url": "https://wpnews.pro/news/maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host", "canonical_source": "https://developers.googleblog.com/maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host-tpus/", "published_at": "2026-05-22 08:35:29.613681+00:00", "updated_at": "2026-05-22 08:35:33.524233+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "developer-tools", "cloud-computing"], "entities": ["MaxText", "TPU", "JAX", "Tunix", "Hugging Face", "vLLM", "SFT", "RL"], "alternates": {"html": "https://wpnews.pro/news/maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host", "markdown": "https://wpnews.pro/news/maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host.md", "text": "https://wpnews.pro/news/maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host.txt", "jsonld": "https://wpnews.pro/news/maxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host.jsonld"}}