AI’s next safety challenge is helping users know when to trust it

AI’s next safety challenge is helping users know when to trust it
Representative image. Credit: ChatGPT

A new review paper published in AI & Society finds that research on how people form mental models of AI - the internal explanations people build to understand and predict how a system works - is growing quickly, but remains fragmented across theories, methods and assumptions, making it harder to design AI systems that people can use with appropriate trust and control.

The study, titled Users' mental models within human–AI interaction: a systematic scoping review, reviewed 52 empirical studies on users' mental models and expectations in human-AI interaction, finding that the field is expanding rapidly but still lacks shared concepts, stronger qualitative evidence and long-term research on how users' understanding of AI changes over time.

Misread AI systems can lead to overtrust or rejection

In human-AI interaction, mental models shape whether users rely on AI, challenge it, ignore it, overestimate it or treat it as more human-like than it is. The authors argue that this issue is becoming urgent because AI systems are no longer simple tools that follow obvious rules. Many modern systems are probabilistic, adaptive, generative and embedded in socially meaningful roles. They write text, make recommendations, classify images, support diagnoses, guide vehicles, assist workers, speak in natural language and increasingly appear as collaborators, advisers, companions or agents.

When users overestimate AI, they may trust flawed outputs, defer to automated recommendations, lose situational awareness or shift responsibility away from themselves. When they underestimate AI, they may reject useful support and lose the benefits of systems that could improve decisions or performance. The key problem is calibration: users need mental models that are neither inflated nor dismissive.

The review links this challenge to a longer history of human-computer interaction research. Previous work showed that people often respond socially to computers even when they know the systems are not human. With modern AI, that tendency becomes more consequential because chatbots, voice assistants, robots and generative systems can use language, simulate personality and appear to reason.

The authors note that users may assign intention, agency, intelligence, warmth or competence to AI systems based on interface cues such as voice, wording, response style, embodiment or apparent social presence. These interpretations can help interaction feel natural, but they can also mislead users about what the system actually understands.

Anthropomorphism is not the only issue. Users may also misunderstand the boundaries of AI decision-making. A medical worker may not know when to trust a diagnostic system. A driver may misjudge an autonomous vehicle's limits. A student may assume a chatbot knows more than it does. A worker may treat an algorithmic recommendation as neutral when it reflects design choices, training data or institutional priorities.

The paper identifies a major gap between the importance of mental models and the state of research studying them. Although mental models are often used to explain trust, reliance and performance in AI systems, the term is defined and measured in many different ways. Some studies treat mental models as users' understanding of system function. Others link them to trust calibration, prediction accuracy, perceived limitations or expectations about AI behavior.

Human-AI interaction is interdisciplinary. However, the authors warn that the field risks moving in parallel tracks rather than building cumulative knowledge. Without clearer conceptual alignment, findings from one area may be difficult to compare with another, limiting their usefulness for AI design, governance and public education.

Research is expanding but remains fragmented

The review covered studies from multidisciplinary and technical databases, including Scopus, Web of Science, ACM and IEEE. The search initially identified 301 records, with 52 empirical studies included after screening and full-text review. The first included study appeared in 2018, and publication activity rose in later years, reflecting the growing urgency of the topic as AI systems became more visible and widely used.

The reviewed studies covered a broad range of AI systems. These included conversational agents such as chatbots and voice assistants, autonomous vehicles, diagnostic AI, robots, recommender systems, classifiers, predictive systems and everyday AI applications. Conversational AI appeared especially prominent, reflecting the rapid rise of large language models and public interest after the release of ChatGPT.

The systems also differed in how they interacted with users. Some were assistive, supporting users while leaving control largely in human hands. Others were collaborative, sharing parts of a task with people. Some were autonomous, operating independently once activated. A smaller group involved competitive AI, where systems acted as opponents in game-like or adversarial settings.

These differences are significant because users do not form the same expectations for every kind of AI. A person may treat an assistive tool as a support system, an autonomous vehicle as an independent decision-maker, a chatbot as a social partner or a classifier as a technical authority. The review argues that mental models depend not only on AI capability, but also on the role the system appears to occupy.

The authors found theoretical diversity across the field. Studies drew on mental model theory, explainable AI, shared mental models, theory of mind, technology acceptance models, anthropomorphism theory, social presence theory, the Computers as Social Actors paradigm, stereotype content models, automation bias, algorithmic aversion and other frameworks.

This range shows that researchers are trying to understand AI interaction from multiple angles: cognition, interface design, trust, social attribution, decision-making, emotion and behavior. However, it also creates a challenge. Many theories assume different things about users and AI. Some focus on reflective understanding, while others emphasize automatic social responses. Some treat AI as a tool, others as an adviser, teammate or social actor.

These assumptions can lead to different design goals. Explainable AI research often assumes that transparency helps users build better mental models. Social cue research may suggest that human-like design makes interaction smoother, but these goals can conflict. More human-like AI may feel easier to use while also encouraging unrealistic assumptions about understanding or agency.

The review also finds that methodological practice is uneven. Quantitative methods dominated the field, accounting for 65.4 percent of the reviewed studies. Mixed-method studies made up 25 percent, while purely qualitative studies accounted for only 9.6 percent. Experiments were by far the most common technique.

Many studies relied on self-reported measures such as Likert scales, trust questionnaires, user experience ratings and perceived workload tools. These measures are useful for comparison, but they may not capture the full complexity of how people understand AI. Self-reports can also be shaped by response bias, context and users' limited ability to explain their own assumptions.

The authors found fewer studies using deeper qualitative methods such as interviews, phenomenological work or long-term observation. That creates a gap because mental models often develop through experience, breakdowns, repair, surprise and repeated use. A short experiment may show how users respond in one task, but not how their understanding evolves over weeks or months.

Sampling is another concern. Many studies used online recruitment platforms such as Prolific or Amazon Mechanical Turk. These platforms are efficient, but they may overrepresent digitally literate, younger and more research-experienced participants. Several studies also relied on students or college-educated users. The authors argue that this limits understanding of how older adults, lower-literacy users, professionals in high-stakes domains and culturally diverse groups form mental models of AI.

Human-centered AI needs better evidence on user understanding

The review challenges a common assumption in AI design - better explanations automatically produce better use. Explainability is important, but the authors argue that transparency is not a universal fix. Explanations only help when they match the user's goals, context, task, expertise and need for control.

In some settings, users may need detailed technical explanations. In others, they may need clear boundaries, practical warnings, performance cues or opportunities to override the system. A doctor, driver, student, job applicant and everyday chatbot user will not need the same kind of understanding.

The authors also warn that research often focuses too heavily on performance outcomes, such as task accuracy, speed or efficiency. Those measures matter, but they do not fully capture human-AI interaction. People also experience AI through emotion, agency, responsibility, discomfort, dependence, resistance and social meaning.

AI can change how users understand their own role. In some cases, people may feel empowered by AI support. In others, they may feel displaced, monitored or less responsible for decisions. Mental models therefore shape not only whether users trust AI, but whether they feel they still have meaningful control.

The review points to human-centered AI as a more useful frame than a simple trade-off between automation and human agency. Higher AI capability does not have to mean lower human control. System design can preserve oversight, intervention and accountability even when AI performs advanced tasks. But users must understand where their authority begins and ends.

The authors call for deeper qualitative and longitudinal research to track how mental models form and change over time. They also call for more naturalistic studies outside controlled lab settings, stronger attention to context, and more inclusive recruitment. AI is now used in homes, schools, hospitals, workplaces and public services, making real-world evidence critical.

The review also urges researchers to clarify what they mean by mental models and how those models differ from related constructs such as trust, expectation, affect and perceived responsibility. These concepts overlap, but treating them as interchangeable weakens the field's ability to build solid theory.

Interfaces should avoid encouraging users to overestimate AI competence, especially when systems use human-like language or social cues. At the same time, systems should not be so opaque that users cannot judge their limits. Better design means helping users understand what AI is for, what it can do, where it fails and when human judgment must intervene.

AI literacy should go beyond teaching people that AI exists or that it uses data. Users need practical understanding of AI's limits, uncertainty, responsibility structures and role in decision-making. Training should be tailored to use context, especially in high-stakes domains such as medicine, education, employment and transportation.

The authors acknowledge that their work focused on studies that explicitly used terms such as mental model or expectation and on English-language empirical research. Relevant studies using related terms such as trust, user understanding, intelligent systems or agents may fall outside that frame.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback