Experience with dissimilar language ablation?

A Hugging Face user is exploring dissimilar language ablation, aiming to remove Mandarin, Russian, and Arabic from a primarily Latin-based language model to free space for further training or pruning. They are creating a Swadesh-like list of noun-verb pairs across the four languages for this purpose.

a13ph https://discuss.huggingface.co/u/a13ph 1 Anyone here have experience with dissimilar language ablation? I’m thinking of ablating Mandarin/Russian/Arabic to leave a primarily Latin-base language model with the hope of making some space for further training and/or safely pruning where English demonstrably has no activation. I’m presently creating a Swadesh- esque noun/verb “thing”,“does” pair list across the four languages where each pair either token-matches every other pair or gets padded to match, if necessary.