Trainsafe — behavioral health checks for HuggingFace/TRL fine-tuning

Developer Ammar Hassona released Trainsafe, a TrainerCallback for HuggingFace/TRL that runs behavioral health checks during fine-tuning, catching failures like language drift and output collapse that loss metrics miss. The tool integrates with SFTTrainer, DPOTrainer, GRPOTrainer, and base Trainer, and is available via pip install trainsafe.

I was fine-tuning an Arabic model using DPO. Loss looked perfect the entire run. Training finished. It spoke Chinese. So I built trainsafe — a TrainerCallback that runs behavioral checks at every eval checkpoint and catches failures that loss never surfaces: language drift, output collapse, repetition loops, prompt echoing, format drift. Two lines to add to any existing training script: python from trainsafe import TrainSafeCallback trainer = SFTTrainer ..., callbacks= TrainSafeCallback Works with SFTTrainer, DPOTrainer, GRPOTrainer, and base Trainer. pip install trainsafe GitHub: GitHub - AmmarHassona/trainsafe: Behavioral health checks for HuggingFace / TRL fine-tuning. Monitors outputs at each checkpoint and stops training if something goes wrong. · GitHub https://github.com/AmmarHassona/trainsafe Happy to answer questions or if anyone has any comments, it’s still early and feedback is welcome