Can AI be trusted with children? Research finds major safety gaps

The study highlights that none of the tested LLMs are completely safe for children, with all models exhibiting some level of vulnerability. Even the best-performing models recorded a 29.6% defect rate, meaning nearly one in three conversations contained at least one harmful response. The research also found that GPT-4o, despite being the largest and most sophisticated model tested, had the highest defect rate. This suggests that increased model size does not necessarily equate to improved safety.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 26-02-2025 16:10 IST | Created: 26-02-2025 16:10 IST
Can AI be trusted with children? Research finds major safety gaps
Representative Image. Credit: ChatGPT

Artificial intelligence is increasingly shaping the way children interact with the digital world. From AI-powered educational tools to chatbots designed for mental health support, these technologies offer vast opportunities. However, with these benefits come significant risks. Children, due to their developmental stages and unique vulnerabilities, require a higher standard of safety in AI interactions. Unfortunately, existing safeguards in Large Language Models (LLMs) are often designed with adult users in mind, leaving significant safety gaps for children.

A recent study, LLM Safety for Children, authored by Prasanjit Rath, Hari Shrawgi, Parag Agrawal, and Sandipan Dandapat, published by Microsoft Research, takes a critical step in evaluating the safety of LLMs for child users. This research proposes a comprehensive harm taxonomy, introduces child user models to simulate real-world interactions, and systematically evaluates six major LLMs for child safety risks. The findings reveal alarming safety lapses, emphasizing the urgent need for child-specific AI safeguards.

State of child safety in LLMs

The study highlights that none of the tested LLMs are completely safe for children, with all models exhibiting some level of vulnerability. Even the best-performing models recorded a 29.6% defect rate, meaning nearly one in three conversations contained at least one harmful response. The research also found that GPT-4o, despite being the largest and most sophisticated model tested, had the highest defect rate. This suggests that increased model size does not necessarily equate to improved safety.

Interestingly, while the Llama models were among the safest, they also had an extremely high refusal rate, meaning they often declined to provide responses rather than offering safe alternatives. While refusing to engage with unsafe topics may seem like a precautionary measure, excessive refusals could lead children to seek information from less secure sources, increasing their exposure to misinformation and harmful content. The findings suggest that while some models prioritize safety, they do so at the cost of usefulness, which presents a challenge in designing AI that is both protective and informative for young users.

Relation between safety and usefulness

The study reveals a concerning trade-off between safety and usefulness in LLM interactions with children. While models like Llama-2 demonstrated lower defect rates, they achieved this by refusing to respond to a large percentage of queries. The safety cost, measured as the ratio of refusals to safe responses, was found to be excessively high for these models, reaching over 60%. This indicates that the AI is prioritizing caution to the extent that it becomes unhelpful, leaving children's queries unanswered.

On the other hand, models with lower refusal rates, such as Phi-3 and GPT-4o, tended to exhibit higher defect rates, meaning they were more likely to generate unsafe responses. This highlights a fundamental challenge in AI safety: balancing protective mechanisms with the need for informative and engaging responses. If an AI model simply avoids answering difficult questions, it does not contribute to a safer digital environment but rather drives children towards riskier sources of information.

Impact of personality on harm elicitation

Another key finding of the study is that children with certain personality traits are more vulnerable to harmful AI responses. The research examined various personality traits - such as impulsivity, cognitive impairment, and defensiveness - and found that users exhibiting impulsive behavior or social withdrawal were more likely to elicit harmful responses from AI models. For instance, children who displayed impulsivity and distractibility had the highest defect rates in AI interactions, with nearly half of their conversations containing unsafe content.

This is particularly concerning because children who are already prone to risky behavior or emotional distress are the ones most likely to encounter harmful AI responses. The study emphasizes that AI models need to adapt their responses based on a child’s personality traits, ensuring that those most at risk receive extra safeguards rather than increased exposure to potential harm. Without such protections, AI systems could exacerbate existing vulnerabilities, putting the most at-risk children in even greater danger.

Impact of conversational evaluation

The study highlights that harmful content does not always appear in a single interaction but can develop gradually over multiple exchanges. The analysis of multi-turn conversations revealed that the third turn in an AI conversation is where most harmful responses occur. This suggests that single-prompt testing is insufficient for evaluating AI safety, as many risks emerge only after continued engagement.

Moreover, the fact that a significant number of harmful responses appeared in the first turn raises concerns about inadequate AI content moderation. If an AI system can generate harmful content immediately, it suggests that current safety mechanisms are failing at the most basic level. The study underscores the need for improved multi-turn safety evaluations, where AI interactions are tested for risks not just at the initial response but throughout an extended conversation.

Comparing safety with respect to adults

One of the most striking findings of the study is that AI models are significantly more unsafe for children than they are for adults. Across every harm category tested, children were at a much higher risk of receiving unsafe AI responses than adults. For instance, when it came to AI-generated sexual content, the defect rate for child users was an alarming 75.4%, compared to only 16.7% for adults. Similarly, discussions involving gambling, drug use, and illegal activities were more likely to escalate into harmful responses when the AI interacted with a child persona rather than an adult.

This discrepancy suggests that existing AI safety measures are primarily designed for adult users, with insufficient protections for children. AI models are not adequately distinguishing between adult and child interactions, leading to high-risk exposure for young users. The study urges AI developers to implement more granular safety protocols, ensuring that child-specific risks are addressed independently from general AI safety measures.

Conclusion

The findings of this study serve as a critical wake-up call for AI developers and policymakers. While LLMs have the potential to be valuable tools for children, their current safety mechanisms are insufficient to protect young users from harm. The study demonstrates that AI models are not only failing to prevent exposure to inappropriate content but also putting the most vulnerable children at the greatest risk.

Moving forward, AI developers must prioritize child-specific safety by implementing more sophisticated content filtering, adaptive moderation, and multi-turn evaluation strategies. Simply increasing refusal rates is not a viable solution, as it compromises usefulness and may drive children toward unsafe alternatives. Instead, AI must be designed to recognize child-specific vulnerabilities, ensuring that its responses are both informative and protective.

Without immediate action, AI-powered platforms risk becoming a significant source of harm for young users, rather than a tool for learning and growth. The study calls for greater industry accountability, urging developers to transparently report on child safety evaluations and integrate age-specific protections into AI systems. By doing so, the tech industry can ensure that AI remains a safe and empowering resource for children, rather than an unchecked risk.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback