{"slug": "sft-drives-geminis-safety-properties", "title": "SFT Drives Gemini’s Safety Properties", "summary": "Google DeepMind researchers found that supervised fine-tuning (SFT), not reinforcement learning, drives most safety properties in Gemini models. Comparing SFT-only versions of Gemini 3.1 Pro and Gemini 3 Flash to production models showed near-identical safety benchmark performance. The team plans to focus safety interventions on SFT in future Gemini versions.", "body_md": "*This is the third in a series of informal research updates from the Google DeepMind Language Model Interpretability team, in interpretability and adjacent areas. The second post can be found **here**.*\n\nIn this short post, we describe a surprising finding: **most safety relevant properties in Gemini seem to be caused by the combination of pretraining and SFT, ****not ****other training stages like RL.**\n\nWe do not want to overstate this claim as applying to other model families, and we also note that this may change in future Gemini versions. Nevertheless, this result was counter to our initial expectations and will inform future safety work on our team, and so we felt that it was important to share with the broader safety community.\n\nWe perform SFT using the Gemini mixture on the pre-training only versions of Gemini 3.1 Pro and Gemini 3 Flash. We then compare these Post-SFT models to the production versions of Gemini 3.1 Pro and Gemini 3 Flash on different safety relevant benchmarks:\n\nError bars are 95% confidence intervals on the evals.\n\nThe main result is that the blue bars (SFT-only models) and orange bars (production models) are *remarkably similar* *across evals*.\n\nAn important implication is that for Gemini, SFT is a high leverage place to intervene for model safety and behavior, and we plan to try to intervene here in the future.", "url": "https://wpnews.pro/news/sft-drives-geminis-safety-properties", "canonical_source": "https://www.lesswrong.com/posts/nLrrYweeFxgXACSmS/sft-drives-gemini-s-safety-properties-1", "published_at": "2026-06-13 15:31:52+00:00", "updated_at": "2026-06-13 15:48:42.918723+00:00", "lang": "en", "topics": ["ai-safety", "large-language-models", "artificial-intelligence", "machine-learning", "ai-research"], "entities": ["Google DeepMind", "Gemini", "Gemini 3.1 Pro", "Gemini 3 Flash"], "alternates": {"html": "https://wpnews.pro/news/sft-drives-geminis-safety-properties", "markdown": "https://wpnews.pro/news/sft-drives-geminis-safety-properties.md", "text": "https://wpnews.pro/news/sft-drives-geminis-safety-properties.txt", "jsonld": "https://wpnews.pro/news/sft-drives-geminis-safety-properties.jsonld"}}