{"slug": "breaking-the-tokenizer-barrier-on-policy-distillation-across-model-families", "title": "Breaking the Tokenizer Barrier: On-Policy Distillation Across Model Families", "summary": "Researchers have developed a method for on-policy distillation (OPD) that works across different tokenizers, enabling knowledge transfer between large language models (LLMs) from different model families. The approach uses a token-mapping algorithm to propagate token-level signals, achieving higher compute efficiency than supervised fine-tuning on various benchmarks. This breakthrough expands the range of teacher-student pairs that can benefit from OPD, potentially improving LLM post-training.", "body_md": "# Computer Science > Machine Learning\n\n[Submitted on 8 Jun 2026]\n\n# Title:Breaking the Tokenizer Barrier: On-Policy Distillation across Model Families\n\n[View PDF](/pdf/2606.09456)\n\n[HTML (experimental)](https://arxiv.org/html/2606.09456v1)\n\nAbstract:On-Policy Distillation (OPD) has become a core technique in the post-training of Large Language Models (LLMs) for transferring knowledge from domain experts to student models. However, existing OPD distillation methods require teacher and student models to share the same tokenizer, restricting the applicability of OPD within the model series. Current mainstream practice typically employs Supervised Fine-Tuning (SFT) on teacher-generated responses for cross-tokenizer distillation, which fails to capture the rich knowledge embedded in the teacher's probability distribution. In this work, we enable the standard on-policy distillation method to operate across model families, ensuring that high-fidelity token-level signals can propagate across different tokenizers with a precise token-mapping algorithm. Extensive experiments show that cross-tokenizer OPD is significantly more compute-efficient than baselines on various benchmarks. Our results unlock a broader range of teacher-student pairs for OPD, opening up new avenues for adapting and enhancing interactions between LLMs.\n\n### References & Citations\n\nLoading...\n\n# Bibliographic and Citation Tools\n\nBibliographic Explorer\n\n*(*[What is the Explorer?](https://info.arxiv.org/labs/showcase.html#arxiv-bibliographic-explorer))\nConnected Papers\n\n*(*[What is Connected Papers?](https://www.connectedpapers.com/about))\nLitmaps\n\n*(*[What is Litmaps?](https://www.litmaps.co/))\nscite Smart Citations\n\n*(*[What are Smart Citations?](https://www.scite.ai/))# Code, Data and Media Associated with this Article\n\nalphaXiv\n\n*(*[What is alphaXiv?](https://alphaxiv.org/))\nCatalyzeX Code Finder for Papers\n\n*(*[What is CatalyzeX?](https://www.catalyzex.com))\nDagsHub\n\n*(*[What is DagsHub?](https://dagshub.com/))\nGotit.pub\n\n*(*[What is GotitPub?](http://gotit.pub/faq))\nHugging Face\n\n*(*[What is Huggingface?](https://huggingface.co/huggingface))\nScienceCast\n\n*(*[What is ScienceCast?](https://sciencecast.org/welcome))# Demos\n\n# Recommenders and Search Tools\n\nInfluence Flower\n\n*(*[What are Influence Flowers?](https://influencemap.cmlab.dev/))\nCORE Recommender\n\n*(*[What is CORE?](https://core.ac.uk/services/recommender))\nIArxiv Recommender\n\n*(*[What is IArxiv?](https://iarxiv.org/about))# arXivLabs: experimental projects with community collaborators\n\narXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.\n\nBoth individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.\n\nHave an idea for a project that will add value for arXiv's community? [ Learn more about arXivLabs](https://info.arxiv.org/labs/index.html).", "url": "https://wpnews.pro/news/breaking-the-tokenizer-barrier-on-policy-distillation-across-model-families", "canonical_source": "https://arxiv.org/abs/2606.09456", "published_at": "2026-06-29 03:37:59+00:00", "updated_at": "2026-06-29 03:58:04.855599+00:00", "lang": "en", "topics": ["large-language-models", "machine-learning", "artificial-intelligence"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/breaking-the-tokenizer-barrier-on-policy-distillation-across-model-families", "markdown": "https://wpnews.pro/news/breaking-the-tokenizer-barrier-on-policy-distillation-across-model-families.md", "text": "https://wpnews.pro/news/breaking-the-tokenizer-barrier-on-policy-distillation-across-model-families.txt", "jsonld": "https://wpnews.pro/news/breaking-the-tokenizer-barrier-on-policy-distillation-across-model-families.jsonld"}}