Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language

wpnews.pro

cd /news/large-language-models/creating-multilingual-mental-health-… · home › topics › large-language-models › article

[ARTICLE · art-33549] src=arxiv.org ↗ pub=2026-06-19T04:00Z topic=large-language-models verified=true sentiment=↓ negative

Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language

Researchers found that using persona-based methods to generate multilingual mental health datasets by modifying nationality and language parameters introduces clinical inconsistencies across languages. LLMs evaluating depression severity in non-English texts showed inaccuracies, highlighting the need for culturally responsive data generation for equitable global mental health systems.

read1 min views1 publishedJun 19, 2026

arXiv:2606.19640v1 Announce Type: new Abstract: AI and large language models (LLMs) have emerged as promising tools to address global mental health challenges. Despite the global nature of these challenges, there remains a critical shortage of high-quality datasets for training and evaluating such systems. To mitigate this gap, researchers increasingly generate synthetic clinical personas to simulate user data and test digital mental health support systems. However, most validated personas rely on English-centric contexts. This paper investigates whether similar persona-based methods can be used to generate multilingual mental health datasets. We modified nationality and language parameters in personas to generate clinical dialogues in Mandarin, Bengali, and Hindi. We then examined how different LLMs perform when evaluating the depression severity of these generated multilingual datasets against the baseline in English. Our findings indicate that just adding nationality and language parameters in personas might not be adequate, as it can introduce clinical inconsistency across languages. LLM judge models often exhibit inaccuracies in assessing depression severity in non-English texts, with performance varying across different models. This exposes the systemic limitations of applying English-centric personas to multilingual contexts. Ultimately, our work highlights the urgent need for culturally responsive data generation to ensure equitable mental health systems globally.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/creating-multilingual-me…

Read original on arxiv.org → arxiv.org/abs/2606.19640

mentioned entities

arXiv

LLMs

Bengali

Hindi

Mandarin

English

metadata

slugcreating-multilingual-mental-health-dialogue-datasets-limits-of-persona-based

topic#large-language-models

secondary2 topics

sentimentnegative

canonicalarxiv.org

navigation

← prevNewegg deal drops RTX 5060 Ti 16…

next →Stop Saying "It Works on My Mach…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 19 Jun · #large-language-models

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

arxiv.org · 19 Jun · #large-language-models

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

arxiv.org · 19 Jun · #large-language-models

Diffusion Language Models: An Experimental Analysis

arxiv.org · 19 Jun · #large-language-models

GLARE: A Natural Language Interface for Querying Global Explanations

── more on @arxiv 3 stories trending now

wpnews · 18 Jun · #large-language-models

ICYMI: ZAI launches GLM-5.2 open model with 1M context

wpnews · 18 Jun · #ai-chips

Apple and Intel join forces in Trump’s push to bring chipmaking home

wpnews · 18 Jun · #ai-agents

How to Automate Business Reports With an AI Agent Instead of Dashboards

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required