I was fine-tuning an Arabic model using DPO. Loss looked perfect the
entire run. Training finished. It spoke Chinese.
So I built trainsafe — a TrainerCallback that runs behavioral checks
at every eval checkpoint and catches failures that loss never surfaces:
language drift, output collapse, repetition loops, prompt echoing,
format drift.
Two lines to add to any existing training script:
from trainsafe import TrainSafeCallback
trainer = SFTTrainer(..., callbacks=[TrainSafeCallback()])
Works with SFTTrainer, DPOTrainer, GRPOTrainer, and base Trainer.
pip install trainsafe
Happy to answer questions or if anyone has any comments, it’s still early and feedback is welcome