AI chatbots nail mental health facts but miss mark on accessibility
With increasing numbers of individuals turning to AI systems for information, these tools are often the first point of contact for health advice, including on sensitive issues such as depression, anxiety, and behavioral disorders.
Artificial intelligence is increasingly shaping how people understand mental health, but a new study warns that accuracy alone is not enough. Filipe Prazeres, from the University of Beira Interior and the University of Porto, Portugal, analyzed whether two leading AI chatbots, ChatGPT-4o and Google Gemini, can correctly identify and dispel common myths about children’s and adolescents’ mental health.
The research, titled “Can AI Models like ChatGPT and Gemini Dispel Myths About Children’s and Adolescents’ Mental Health? A Comparative Brief Report,” was published in Psychiatry International. It raises a critical question for the digital era: can AI effectively fight misinformation if its responses are out of reach for the very audiences that need them most?
The findings reveal a clear paradox: both chatbots deliver 100 percent accuracy when fact-checking mental health myths but use language far too complex for many parents, teachers, and adolescents to understand.
AI models show perfect accuracy but poor readability
The author conducted the comparative study against a backdrop of growing mental health challenges among Europe’s youth. According to UNICEF’s 2024 State of Children in the European Union report, over 11 million European children and adolescents, roughly 13 percent, live with mental health disorders. Despite this, stigma and misinformation continue to prevent early diagnosis and treatment.
The research tested whether AI chatbots could act as accessible educational tools by correctly identifying myths and explaining them in an understandable way. Seven prevalent misconceptions were selected from the WHO–UNICEF “Teacher’s Guide for Social and Emotional Learning,” including beliefs such as “mental illness is a sign of weakness,” “bad parenting causes mental illness,” and “children do not experience depression.”
Using the same prompt, both ChatGPT-4o (September 2024 version) and Gemini (September 2024 App version) were asked to classify each statement as true or false and explain their reasoning. Each response was then analyzed for accuracy, length, and readability using standardized scoring systems, including the Flesch–Kincaid Grade Level and Reading Ease indices.
Both models achieved perfect accuracy, correctly identifying all seven myths as false and providing explanations consistent with WHO and UNICEF guidelines. However, their responses revealed a crucial limitation: the average reading level exceeded high school comprehension, with ChatGPT scoring 11.7 and Gemini 10.2 on the Flesch–Kincaid scale, equivalent to a college-level reading difficulty.
While the responses were factually sound and professionally structured, the author notes that their sophistication could alienate readers without advanced health literacy. The study underscores that linguistic accessibility remains a barrier in AI-driven health communication, particularly when addressing vulnerable populations like adolescents or non-specialist caregivers.
Misinformation and the role of AI in public health education
With increasing numbers of individuals turning to AI systems for information, these tools are often the first point of contact for health advice, including on sensitive issues such as depression, anxiety, and behavioral disorders.
Misinformation surrounding mental health remains widespread, particularly online. False beliefs, such as the idea that mental illness reflects moral failure or lack of discipline, contribute to social stigma, underdiagnosis, and delayed treatment. By assessing how well AI models can counter such misconceptions, Prazeres’s research addresses a vital public health concern: can automated systems improve mental health literacy at scale?
The findings suggest that large language models (LLMs) like ChatGPT and Gemini have the potential to reduce misinformation and reinforce evidence-based understanding. Both models drew upon scientifically sound explanations and rejected harmful stereotypes. Yet the study cautions that technical accuracy alone does not guarantee effective education.
In real-world conditions, users rarely prompt chatbots for simpler or age-appropriate explanations. As a result, the default responses, written in academic or medical language, may fail to resonate with teenagers, parents, or educators seeking clear guidance. The study argues that user experience design and readability adjustments must become integral to AI health applications if they are to fulfill their educational promise.
The study also raises concerns about transparency and accountability. While both AI models delivered accurate content, neither cited external sources or provided references, limiting users’ ability to verify the information. This lack of traceability, common among LLMs, presents a challenge for establishing public trust in AI-mediated health information.
Bridging the gap between accuracy and comprehension
While ChatGPT-4o and Gemini performed equally well in factual correctness, their differences in response style reflect broader challenges in designing AI tools for public engagement. ChatGPT’s responses tended to be more structured and explanatory, while Gemini’s were shorter but less contextually rich. Despite these stylistic nuances, both systems produced high-level medical reasoning, which, while informative, is often inaccessible to younger or non-expert readers.
The analysis suggests that future iterations of AI chatbots must focus on contextual adaptability, the ability to tailor responses dynamically based on the user’s age, reading ability, and emotional context. For example, an adolescent asking about mental health myths should receive responses written in clear, conversational language rather than academic prose.
The study also calls for broader research into cross-linguistic and cultural performance. Because the test was conducted exclusively in English, the results may not represent how well these models perform in other languages or in low-resource linguistic settings. Considering that millions of children worldwide access AI tools in their native languages, ensuring multilingual inclusivity is key to equitable digital health literacy.
The study further recommends collaboration between AI developers, educators, and public health experts to create hybrid communication systems where chatbots support, but do not replace, human professionals. For example, integrated AI-assisted teaching tools in schools could help debunk myths while maintaining human oversight and emotional support.
- READ MORE ON:
- ChatGPT mental health myths
- AI in mental health education
- how ChatGPT and Gemini address myths about youth mental health
- AI-driven education for children’s mental health awareness
- role of artificial intelligence in debunking mental health misinformation
- mental health literacy and AI chatbots
- FIRST PUBLISHED IN:
- Devdiscourse

