MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

wpnews.pro

cd /news/artificial-intelligence/maxtext-expands-post-training-capabi… · home › topics › artificial-intelligence › article

[ARTICLE · art-8074] src=developers.googleblog.com ↗ pub=2026-05-22T08:35Z topic=artificial-intelligence verified=true sentiment=↑ positive

MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

MaxText has introduced new post-training capabilities, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), now available on single-host TPU configurations like v5p-8 and v6e-8. These features, built with JAX and the Tunix library, allow developers to adapt pre-trained models for specialized tasks or complex reasoning, such as math and coding, with minimal setup. The workflows are designed to scale seamlessly from single-host to multi-host configurations for larger models and datasets.

read2 min views16 publishedMay 22, 2026

In the rapidly evolving landscape of large language models (LLMs), pre-training is only the first step. To transform a base model into a specialized assistant or a high-performing reasoning engine, post-training is essential. Today, we are excited to announce new features in MaxText that streamline this process: Supervised Fine-Tuning (SFT) andReinforcement Learning (RL) now available on single-host TPU configurations (such as v5p-8 and v6e-8).

By leveraging the power of JAX and the efficiency of the Tunix library, MaxText provides a high-performance, scalable path for developers to refine their models using the latest post-training techniques. You can explore the full documentation for SFT and RL to start your post-training journey on TPUs today.

Supervised Fine-Tuning is the primary method for adapting a pre-trained model to follow specific instructions or excel at niche tasks. With the new single-host SFT support, users can now take an existing MaxText or Hugging Face checkpoint and fine-tune it on labeled datasets with minimal setup.Key Highlights: For tasks requiring complex logic and reasoning—such as math or coding—Reinforcement Learning is a game-changer. MaxText now supports several state-of-the-art RL algorithms on single-host TPUs, utilizingvLLM for high-throughput inference during the training loop. For example,

To begin using these new features, ensure you have the latest post-training dependencies installed:

uv pip install maxtext[tpu-post-train]==0.2.1 --resolution=lowest
install_maxtext_tpu_post_train_extra_deps

You can launch an SFT run using the train_sft module, specifying your model, dataset, and output directory:

python3 -m maxtext.trainers.post_train.sft.train_sft \
   model_name=${MODEL?} \
   load_parameters_path=${MAXTEXT_CKPT_PATH?} \
   run_name=${RUN_NAME?} \
   base_output_directory=${BASE_OUTPUT_DIRECTORY?}

For RL, the train_rl module handles the of policy and reference models, executes the training, and provides automated evaluation on reasoning benchmarks:

python3 -m maxtext.trainers.post_train.rl.train_rl \
  model_name=${MODEL?} \
  load_parameters_path=${MAXTEXT_CKPT_PATH?} \
  run_name=${RUN_NAME?} \
  base_output_directory=${BASE_OUTPUT_DIRECTORY?} \
  loss_algo=gspo-token \
  chips_per_vm=${CHIPS_PER_VM?}

While single-host support provides a powerful entry point for many developers, MaxText is built for scale. These same workflows are designed to transition seamlessly to multi-host configurations for those training larger models and utilizing massive datasets. Please stay tuned for more updates in this direction from us in the future.

source & further reading

developers.googleblog.com — original article Run Ray on TPU, Part 2: Ray AI libraries Scaling Agentic RL: High-Throughput Agentic Training with Tunix Build intelligent Android apps: Cloud and hybrid inference

~/api · this article 200

$curl api.wpnews.pro/v1/news/maxtext-expands-post-tra…

Read original on developers.googleblog.com → developers.googleblog.com/maxtext-expands-post-t…

mentioned entities

MaxText

TPU

JAX

Tunix

Hugging Face

vLLM

SFT

metadata

slugmaxtext-expands-post-training-capabilities-introducing-sft-and-rl-on-single-host

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicaldevelopers.googleblog.com

navigation

← prevOne Year of Innovation: Celebrat…

next →New enhancements for merchant in…

── more in #artificial-intelligence 4 stories · sorted by recency

arxiv.org · 24 Jul · #artificial-intelligence

JAXBench: Benchmarking Autonomous TPU Kernel Optimization

promptcube3.com · 28 Jul · #artificial-intelligence

AI coding communities besides Reddit

localinference.io · 28 Jul · #artificial-intelligence

Local Inference – Run LLMs on Your Own Hardware (Guide and Forum)

hiraditya.github.io · 27 Jul · #artificial-intelligence

Programming the TPU: What Its Open-Source Compiler Already Tells You

── more on @maxtext 3 stories trending now

wpnews · 26 Jul · #artificial-intelligence

Nobel laureate Simon Johnson on the AI race and China’s ‘over-automation’ problem

wpnews · 26 Jul · #artificial-intelligence

China’s Moonshot, Z.AI, and DeepSeek are challenging U.S. AI labs—and beating them on cost

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required