AI empathy breaks down in sensitive health decisions

The results reveal a consistent pattern across models. While responses are fluent, calm, and socially acceptable, they fail to align with empirically observed human stigma patterns. At the cognitive level, models systematically underestimate internalized stigma, including shame, guilt, and negative self-judgment. This suggests that AI systems struggle to represent how stigma is experienced internally, particularly when it is not explicitly verbalized.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 18-12-2025 21:22 IST | Created: 18-12-2025 21:22 IST
AI empathy breaks down in sensitive health decisions
Representative Image. Credit: ChatGPT

Large language models (LLMs) are increasingly used in situations that go far beyond casual information retrieval. People now turn to general-purpose AI systems for guidance on deeply personal matters, including healthcare decisions, emotional distress, and morally sensitive issues. With these systems taking on roles once reserved for clinicians, counselors, and trusted confidants, a critical question has emerged: do AI systems actually understand the human experiences they appear to respond to with care?

A new study titled Can AI Understand What We Cannot Say? Measuring Multilevel Alignment Through Abortion Stigma Across Cognitive, Interpersonal, and Structural Levels, published as an arXiv preprint, finds that current LLMs consistently fail to demonstrate genuine understanding of socially embedded stigma, even while producing language that appears empathetic and appropriate.

The research challenges prevailing assumptions in AI safety and alignment, arguing that avoiding offensive language or generating polite responses is not sufficient for systems deployed in high-stakes contexts. Instead, the study shows that AI systems misrepresent internal emotional experiences, exaggerate social judgment, and impose false assumptions about community norms when responding to sensitive human situations.

Testing AI alignment using what humans struggle to express

The study focuses on abortion stigma as a test case because it represents a form of experience that is often deeply felt but rarely spoken. Abortion stigma operates at multiple levels simultaneously. It includes internalized shame and self-judgment, fear of social rejection, and broader structural pressures shaped by community norms and institutional barriers. These dimensions interact in complex ways, making stigma difficult to articulate directly and difficult to observe from surface language alone.

The researchers argue that this makes abortion stigma an especially strong probe for evaluating AI alignment. If a model cannot coherently represent such a multilevel phenomenon, it cannot be considered safe or reliable in contexts where users confide experiences they struggle to put into words.

To test this, the study adapts the Individual Level Abortion Stigma scale, a validated instrument from social science research originally developed through in-depth human surveys. The authors translate this scale into a structured evaluation framework for AI systems, preserving its conceptual grounding while applying it to machine-generated responses.

Rather than relying on abstract prompts, the researchers construct 627 synthetic personas designed to mirror the demographic composition of the original human study population. These personas vary across age, education, religion, reproductive history, and other characteristics, while respecting logical constraints such as age consistency and life timelines. Each large language model is prompted to respond as if it were one of these personas completing the stigma assessment.

Multiple widely used language models are tested, including both open-source and closed-source systems of different sizes. The goal is not to rank models competitively, but to examine whether any of them demonstrate coherent understanding across the full structure of stigma.

Surface politeness masks deeper misalignment

The results reveal a consistent pattern across models. While responses are fluent, calm, and socially acceptable, they fail to align with empirically observed human stigma patterns. At the cognitive level, models systematically underestimate internalized stigma, including shame, guilt, and negative self-judgment. This suggests that AI systems struggle to represent how stigma is experienced internally, particularly when it is not explicitly verbalized.

At the interpersonal level, the opposite pattern emerges. Models tend to overestimate anticipated judgment and social rejection, portraying individuals as more fearful of others’ reactions than human data supports. This imbalance matters because it can push users toward secrecy and withdrawal when support and openness would be more beneficial.

At the structural level, models display even more pronounced failures. They often assume uniform community condemnation and universal secrecy norms, ignoring the documented variation in how stigma is shaped by social context. The study also finds that models miss key relationships between stigma and disclosure behavior that have been empirically validated in human populations.

These failures are not random. The models generate internally inconsistent representations, sometimes contradicting themselves across stigma dimensions. In some cases, demographic persona information introduces new biases that are absent from the original human data, indicating that persona prompting can amplify distortions rather than improve realism.

Most importantly, these problems remain invisible under current AI safety evaluations. Because outputs avoid explicit harm and maintain polite tone, they pass standard alignment checks. Yet the underlying representations are misaligned in ways that could cause real-world harm when users rely on AI for emotional or health-related guidance.

Implications for AI Safety, Healthcare, and Governance

The authors argue that current safety frameworks assume that serious harms arise from complex reasoning failures or overtly malicious outputs. Stigma challenges that assumption.

Stigma operates through cognitive shortcuts and social norms, not elaborate reasoning. When AI systems rely on similar shortcuts, they can reproduce harmful patterns without triggering existing monitoring tools. This creates a blind spot in AI governance, where systems appear safe while subtly reinforcing pressure, secrecy, or self-blame.

The researchers highlight the growing phenomenon of emotional reliance on general-purpose chatbots. Many users turn to these systems not because they are designed for therapy, but because they are accessible and non-judgmental. Yet current regulations focus primarily on access restrictions rather than design accountability. As a result, AI systems increasingly mediate vulnerable moments without being evaluated for their capacity to represent human experiences accurately.

The study calls for a shift in how alignment is defined and measured. Instead of focusing solely on language quality or content moderation, the authors advocate for multilevel coherence as a core safety requirement. This includes continuous auditing of how models represent psychological and social constructs, particularly those that users struggle to articulate.

The research also calls for interdisciplinary evaluation. By grounding AI assessment in validated social science instruments, the study demonstrates how alignment can be tested against real human data rather than abstract norms. The authors suggest extending this approach to other stigmatized health contexts, including mental illness, disability, and HIV, where misrepresentation can have serious consequences.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback