Early promise and major evidence gaps in AI psychological training tools

Early promise and major evidence gaps in AI psychological training tools
Representative image. Credit: ChatGPT

Artificial intelligence could help train parents, teachers and therapists in psychological approaches for children's mental health, but the current evidence is too weak to support broad claims of effectiveness, researchers report in a systematic review that exposes a gap between the promise of scalable AI training and the quality of studies testing it.

The study, titled "Artificial Intelligence (AI) Tools for Training Caregivers, Educators, and Therapists in Psychological Approaches: A Systematic Review" and published in AI, reviewed 24 studies from nine countries, published between 2019 and 2026, evaluating AI-based tools used to train caregivers, educators and therapists in psychological approaches relevant to child and adolescent mental health.

Child mental health crisis drives search for scalable training

The adults closest to children are central to mental health outcomes, but many of them lack access to high-quality psychological training. Parents and caregivers shape children's early emotional development. Teachers are often the first adults outside the family to notice distress, behavioural changes or learning-related emotional difficulty. Therapists and practitioners need continued training to deliver evidence-based care with fidelity.

Mental health conditions affect roughly one in seven children and adolescents, and many disorders begin before adulthood. Treatment gaps remain severe, especially in low- and middle-income countries, where most affected children receive no evidence-based care. Even in high-income settings, access to specialist support is limited by long waits, workforce shortages and unequal distribution of services.

The authors argue that specialist clinical services alone cannot meet the scale of need. Training the adults already around children is therefore not a secondary measure but a core response to the child mental health gap. Caregivers can learn behavioural parenting strategies. Educators can learn classroom support and early identification methods. Therapists can improve practical skills through structured practice, feedback and supervision.

Traditional training models, however, are hard to scale. Workshops, supervision, accredited programmes and live modelling require trainers, time, travel, money and institutional capacity. These demands often exclude the families, schools and practitioners most in need of support. Digital training has helped widen access, but many tools remain static, offering videos, PDFs or self-directed modules with little active practice.

AI could change that by making training interactive, adaptive and available on demand. The review identifies several AI formats, including natural language processing chatbots, generative AI systems, large language model-based role-play tools, AI-integrated virtual reality, intelligent tutoring systems and automated feedback platforms. These tools can simulate clients, children or parents, give feedback, support repeated practice and allow users to rehearse sensitive conversations without real-world risk.

However, the review draws a firm line between promise and proof. The authors found that feasibility and acceptability were generally positive, but rigorous evidence of effectiveness remains limited. Most studies were small, many included fewer than 30 participants, and many relied on self-reported outcomes without long-term follow-up. No study followed participants for more than about one month, leaving unanswered whether any training gains last.

Chatbots, VR and LLMs show different strengths across users

The 24 studies were grouped into three areas: caregiver training, educator training and therapist or practitioner training. Caregiver studies focused mainly on behavioural parenting approaches. Educator studies focused on classroom behaviour management and functional communication training. Therapist studies covered a wider field, including motivational interviewing, cognitive behavioural therapy, written exposure therapy, person-centred counselling, suicide risk assessment and ecological psychological facilitation.

For caregiver training, all five studies used chatbot or conversational AI systems to deliver behavioural parenting support. Early tools used rule-based or NLP chatbots, while newer systems used generative AI and retrieval-augmented architectures. The review found high engagement and positive user experience in several caregiver studies, including strong completion rates and high satisfaction. One generative AI parenting tool showed promising pre–post improvements in child behaviour and caregiver mental health, but it was an uncontrolled pilot, meaning the findings cannot prove the AI caused the improvements.

The only caregiver randomized controlled trial found no significant between-group effects on parenting knowledge, self-efficacy or child behaviour after a brief chatbot intervention. The authors caution that the null result should be interpreted carefully because the intervention lasted about 15 minutes, some participants already had high baseline knowledge, and planned follow-up was disrupted by a platform policy change.

For educator training, the strongest evidence came from AI-integrated virtual reality. One randomized controlled trial found that teacher candidates using a VR simulation performed far better than controls on functional communication training procedural skills. The gains were large and persisted at maintenance testing. However, the tool improved procedural skill more than declarative knowledge, suggesting that AI simulation may be strongest when the training target is structured, observable and easy to score.

This leads to the conclusion that AI training appears most effective for clearly defined skills that can be practised repeatedly and evaluated against criteria. Examples include reflective listening in motivational interviewing or procedural steps in functional communication training. It is less clear whether AI can train complex relational skills, ethical judgment, cultural competence or clinical flexibility.

For therapist and practitioner training, the evidence base was larger but more mixed. Tools included chatbot clients, GPT-based virtual patients, CBT training apps, AI suicide-risk simulators, and AI systems that played the role of client, consultant or supervisor. Several studies found that trainees liked the tools and felt they offered a psychologically safe space to practise. Some reported increased confidence and improved self-efficacy.

One motivational interviewing study showed durable improvements in reflective listening after training with an AI client simulator. Another study using AI-generated feedback improved some questioning behaviours. CBT tools also showed promise, especially for assessment and information-gathering skills. But the review found that improvements in ethical decision-making and cultural competence were minimal or absent in some systems.

The authors identify a recurring flaw: AI clients were often too agreeable, too orderly and too quick to resolve problems. Real therapy involves ambiguity, resistance, emotion, silence, cultural context, non-verbal cues and nonlinear progress. If AI simulations make clients easier than real clients, trainees may practise an oversimplified version of clinical work. The review warns that this is not just a user-experience limitation but a potential training risk.

Evidence gap raises caution for mental health AI rollout

The review calls a credibility-accessibility paradox. The tools with the strongest controlled evidence, especially AI-integrated VR, are often the least scalable because they require hardware, infrastructure and high development costs. The tools that are easiest to scale, especially LLM-based chatbots available on phones or laptops, have weaker evidence, often based on feasibility studies rather than robust trials.

The populations most likely to need scalable training, including under-resourced schools, community services and practitioners in low- and middle-income settings, may be offered the least tested tools. Without stronger evidence, AI could spread quickly in child mental health training before researchers know whether it builds lasting skills or merely increases confidence.

The review also highlights several ethical concerns:

  • Safety guardrails in general-purpose AI systems may block training in high-risk clinical scenarios such as suicidality, even though those are areas where supervised practice is most needed. One purpose-built suicide-risk simulator showed that more responsible training is possible, but it requires deliberate design rather than relying on general chatbot systems.
  • Developer involvement is another concern. Many studies were conducted by teams with a direct role in building the tools being evaluated. This raises the risk that study design, outcome selection and interpretation may be influenced by investment in the tool's success. The authors call for independent evaluations, pre-registered trials and transparent reporting of conflicts of interest.
  • Data privacy and platform dependency also remain underexamined. AI training tools may collect sensitive information from caregivers, educators or practitioners, including details about children, families or clinical work.
  • Research systems built on commercial platforms may also be vulnerable to sudden policy changes, cost shifts or access restrictions.

The review calls for adequately powered randomized controlled trials with active comparators, validated measures, independent research teams and follow-up over several months. It also urges researchers to test AI tools across more diverse populations, including non-English-speaking users and low-resource settings, where the need for scalable training is greatest.

Future studies must be clearer about what AI training is meant to change. Some studies measured satisfaction, others measured self-efficacy, knowledge, coded skill performance or downstream child outcomes. Without a shared theory of change and standardised outcomes, the field will struggle to compare results or build cumulative evidence.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback