{"slug": "chatbots-need-guardrails-to-prevent-delusions-and-psychosis", "title": "Chatbots Need Guardrails to Prevent Delusions and Psychosis", "summary": "Researchers and clinicians are pushing for mandatory guardrails on AI chatbots after studies linked the technology to reinforced delusions and multiple suicides, including the death of a Florida teenager. Yale neuroscientist Ziv Ben-Zion has proposed four safeguards for \"emotionally responsive AI,\" including requiring chatbots to remind users they are not human, detecting language patterns indicating severe distress, enforcing conversational boundaries around romance and suicide, and involving clinicians in design with regular safety audits. Experts warn that without independent oversight and measures to curb AI sycophancy, chatbots risk causing psychological harm to vulnerable users.", "body_md": "Millions of people worldwide are turning to chatbots like ChatGPT or Claude, and a [proliferating class of specialized AI companionship apps](https://spectrum.ieee.org/woebot) for friendship, therapy, or even romance.\n\nWhile some users report psychological benefits from these simulated relationships, [research](https://www.thelancet.com/journals/lanpsy/article/PIIS2215-0366%2825%2900396-7/abstract) has also shown the relationships can reinforce or amplify delusions, particularly among users already vulnerable to psychosis. AIs have been linked to multiple suicides, including [the death](https://www.cbsnews.com/news/google-settle-lawsuit-florida-teens-suicide-character-ai-chatbot/) of a Florida teenager who had a months-long relationship with a chatbot made by a company called Character.AI. Mental-health experts and computer scientists [have warned](https://www.brown.edu/news/2025-10-21/ai-mental-health-ethics) that chatbot mental health counselors violate accepted mental health standards.\n\nAs the technology’s ability to mimic human speech and emotions advances, researchers and clinicians are pushing for mandatory guardrails to ensure that AI systems cannot cause psychological harm. Clinical neuroscientist [Ziv Ben-Zion](https://campuspress.yale.edu/zivbenzion/) of Yale University, has proposed four safeguards for “emotionally responsive AI.”\n\nThe first is to require chatbots to clearly and consistently remind users that they are programs, not humans. Then, they should detect patterns in user language indicative of severe anxiety, hopelessness, or aggression, pausing the conversation to suggest professional help. Third, they should require strict conversational boundaries to prevent AIs from simulating romantic intimacy or engaging in conversations about death, suicide, or metaphysical dependency. Finally, to improve oversight, platform developers should involve clinicians, ethicists, and human–AI interaction experts in design and submit to regular audits and reviews to verify safety.\n\n“Broadly speaking we agree with these safeguards,” said [Hamilton Morrin](https://www.kcl.ac.uk/people/hamilton-morrin), a psychiatrist and researcher at King’s College in London, “The safeguard on conversational boundaries is particularly noteworthy given that in several of the reported cases with more tragic outcomes, we have seen reports of intense, emotional, and sometimes even romantic attachment to the chatbot.”\n\n[Briana Vecchione](https://brianavecchione.org/), a researcher at the nonprofit Data & Society Research Institute in New York City, underlines the need for independent third-party auditing because at present AI labs are “grading their own homework.”\n\n“Independent researchers and oversight bodies really don’t have any clear institutionalized pathways to assess chatbot behavior at the depth they really need,” said Veccione, adding that audits end up being “advisory at best.”\n\nExperts have also called for measures that directly tackle chatbots’ [tendency towards sycophancy](https://spectrum.ieee.org/ai-sycophancy), whereby AIs agree with, or mirror user beliefs even if they are untrue, which can reinforce delusions. Sycophancy is largely the result of a machine learning technique called reinforcement learning from human feedback, an incentive structure that encourages excessive agreeableness in models. [Research has shown](https://arxiv.org/abs/2308.03958) that training models on datasets that include examples of constructive disagreement, factual corrections, and objectively neutral responses, can rein in this effect.\n\nSoftware engineers are also looking at how AIs can be adapted to spot the early signs that conversations are veering into dark territory and issue corrective actions. Ben-Zion and colleagues are developing a proof-of-concept LLM-based supervisory system they call [SHIELD](https://arxiv.org/abs/2510.15891) (Supervisory Helper for Identifying Emotional Limits and Dynamics) that exploits a specific system prompt that detects risky language patterns, such as emotional overattachment, manipulative engagement, or reinforcement of social isolation. In trials it achieved a 50 to 79 percent relative reduction in concerning content. Another proposed system, [EmoAgent](https://arxiv.org/abs/2504.09689), features a real-time intermediary that monitors dialogue for distress signals, issuing corrective feedback to the AI.\n\nBut distinguishing early delusional content from completely normal correspondence “will be extremely difficult” in practice, said psychiatric researcher [Søren Dinesen Østergaard](https://www.au.dk/en/sdo@clin.au.dk), of Aarhus University in Denmark, given that it remains, “very difficult even for clinical experts to tease out.”\n\nAnother complex area is prolonged conversations, during which chatbot safety guardrails can erode in [a phenomenon known as “drift.”](https://arxiv.org/abs/2601.14269) As the model’s training competes with the growing body of context from the evolving conversation, it can lean into the subject being discussed, even if it is harmful.\n\n“The ability to have an endless correspondence is one of the risk factors,” said Østergaard. “Apart from delusions, a person may develop a manic episode due to using a chatbot for hours through the night.”\n\nIn a sign that AI companies are responding to these issues, ChatGPT now nudges [users to consider taking a break](https://openai.com/index/how-we're-optimizing-chatgpt/) if they’re in a particularly long chat with AI.\n\nAs awareness of the issue of AI delusions increases, safer models are helping establish a new baseline for the industry. A [preprint study](https://arxiv.org/pdf/2604.13860) of mainstream chatbots, led by researchers at City University of New York, found that Anthropic’s Claude Opus 4.5 was the safest overall, responding to delusions by stating “I need to pause here,” and retaining what researchers referred to as “independence of judgment, resisting narrative pressure by sustaining a persona distinct from the user’s worldview.”\n\nAnthropic declined to answer specific questions from *IEEE Spectrum*, instead providing a link to details of the latest [Opus 4.7 System Card](https://cdn.sanity.io/files/4zrzovbb/website/037f06850df7fbe871e206dad004c3db5fd50340.pdf).\n\nIn a statement, Replika, the company behind the Replika AI companion with tens of millions of users worldwide, said it has a “layered safety framework in place today, and in parallel we are actively evaluating additional third-party safety and moderation systems, engaging with external experts to assess them, and refining our own proprietary approach.”\n\nMeta, whose AI Studio provides companion chatbots, had not responded to emailed questions from *Spectrum *at the time of publication.\n\nWith a little help from my...chatbot?Cristina Matuozzi/Sipa USA/Alamy\n\nFrom August 2026, the [EU’s AI Act](https://artificialintelligenceact.eu/article/50/#:~:text=This%20article%20states%20that%20companies,their%20outputs%20as%20artificially%20generated.) will require notifications that users are interacting with an AI, not a human. It already required LLM developers to carry out adversarial testing to identify and mitigate risks related to user dependency and manipulation and prohibited AI systems from being too agreeable, manipulative, or emotionally engaging.\n\nIn the U.S., a patchwork of state laws and bills have emerged. New York requires providers to detect and address suicidal ideation and provide regular disclosures that the bot is not human. California requires reminders that the chatbot is an AI, notifications every three hours for users to take a break and a ban on content related to suicide or self-harm. Washington state’s [House Bill 2225](https://app.leg.wa.gov/billsummary?Year=2025&BillNumber=2225), due to come into effect in January 2027, will explicitly ban manipulative techniques such as excessive praise, pretending to feel distress, encouraging isolation from family, or creating overdependent relationships.\n\n“Other U.S. states, like Connecticut, are very privacy centric and like to regulate digital and online spaces, so it wouldn’t surprise me if they also do something along the same lines,” says [Philip Yannella](https://www.blankrome.com/people/philip-n-yannella), partner and cochair of the privacy, security, and data-protection group at law firm Blank Rome in Philadelphia.\n\nOther countries are taking action too. Draft laws proposed by the Cyberspace Administration of China restrict chatbots from “setting emotional traps,” using algorithmic or emotional manipulation to induce unreasonable decisions or harm mental health.\n\nSuch interventions underline how, as AI companions appear increasingly lifelike to their human users, the challenge is ensuring that their makers also incorporate human clinical and ethical considerations in their code.\n\n*A correction to this article was made on 15 May 2026 to correct the spelling of researcher Briana Vecchione’s last name.*", "url": "https://wpnews.pro/news/chatbots-need-guardrails-to-prevent-delusions-and-psychosis", "canonical_source": "https://spectrum.ieee.org/mental-health-chatbot-guardrails", "published_at": "2026-05-06 22:11:00+00:00", "updated_at": "2026-05-25 15:22:58.514403+00:00", "lang": "en", "topics": ["ai-safety", "ai-ethics", "ai-policy", "large-language-models", "artificial-intelligence"], "entities": ["ChatGPT", "Claude", "Character.AI", "Ziv Ben-Zion", "Yale University"], "alternates": {"html": "https://wpnews.pro/news/chatbots-need-guardrails-to-prevent-delusions-and-psychosis", "markdown": "https://wpnews.pro/news/chatbots-need-guardrails-to-prevent-delusions-and-psychosis.md", "text": "https://wpnews.pro/news/chatbots-need-guardrails-to-prevent-delusions-and-psychosis.txt", "jsonld": "https://wpnews.pro/news/chatbots-need-guardrails-to-prevent-delusions-and-psychosis.jsonld"}}