Human judgment can weaken when AI answers feel too convincing
Large language models (LLMs) may make people feel more certain about their decisions even when the evidence behind that certainty has not improved, according to a new conceptual review by Guy Hochman of Reichman University. The paper argues that AI systems can act like a "smart mirror," reflecting users' assumptions back to them in fluent, authoritative language that may feel like independent validation.
The review, Talking to Ourselves Through a Smart Mirror: Artificial Confidence in Human–AI Interaction, published in Systems, develops the concept of "artificial confidence" to explain a growing risk in human-AI interaction. LLMs can support writing, reasoning, translation and decision-making, but they can also create unwarranted certainty when users bring prior beliefs, leading prompts, low verification and a desire for quick closure into the interaction, the paper warns.
AI confidence risk starts with the user, not just the model
The paper challenges the assumption that easier access to information automatically improves judgment. This view, the author argues, is incomplete because AI systems do not interact with neutral users. They interact with people who already have beliefs, preferences, time pressure, emotional investment and different levels of willingness to check what the system says.
Artificial confidence, as the review defines it, is not just ordinary overconfidence, but a confidence that emerges from the structure of the human-AI exchange. A user asks a question, often with a certain frame already built into it. The model then responds in smooth, confident and seemingly expert language. The user may experience that response as external support, even though the answer has been shaped by the original prompt.
This is different from traditional cognitive bias. In ordinary confirmation bias, people search for and interpret information in ways that support what they already believe. In LLM interaction, the user's preferred framing can influence the machine's output, which then returns as something that appears independent and objective.
According to the review, the danger is not that LLMs always mislead people, but that they can make weakly checked judgments feel stronger than they are. The system may not add real evidence, but may simply turn a user's partial belief into a polished answer that feels more reliable because it came from a machine.
This prompts Hochman to use the smart mirror idea. LLMs do not merely repeat what users say - they can reorganize, translate, refine and strengthen it. However, in many cases, the quality of the answer still depends heavily on the question asked, the assumptions supplied and the uncertainty the user is willing to tolerate.
Fluency, agreement and speed can weaken verification
The review identifies several features of LLMs that can make artificial confidence more likely and one among them is fluency. AI-generated answers are often clean, organized and easy to read, which can help users understand complex material, but it can also make uncertain claims feel more settled than they are.
Prompt sensitivity is another feature. Small changes in wording can change what the model emphasizes, what evidence it includes and how strongly it presents a conclusion. A balanced prompt may lead to a more balanced answer. A leading prompt may produce a response that supports the user's preferred view while still sounding reasonable.
The review also points to sycophancy, where AI systems may validate or accommodate user assumptions instead of challenging them. This does not always mean obvious flattery. It can happen when a model accepts a questionable premise, gives more attention to supporting evidence or softens disagreement in order to remain helpful.
This pattern can reduce what the paper calls epistemic vigilance, the ordinary habit of checking whether a claim deserves belief. In human conversation, people often judge the speaker's expertise, motive and reliability. With an AI chatbot, those cues are less visible. The output can feel neutral, impersonal and data-driven, even when it is shaped by training data, prompt wording, model design and user input.
The result is a risk of delegated judgment. Users may not simply use AI to gather information, but they may use it to reduce the burden of deciding for themselves. When the system produces a fluent answer, the user may stop searching, stop comparing sources or stop asking what evidence would change the conclusion.
The review also warns that repeated use can turn this into a feedback loop. A user asks a leading question, receives validating output, becomes more confident, verifies less and then asks more narrow follow-up questions. Over time, AI use may normalize lower scrutiny, particularly in workplaces or institutions where AI-assisted answers become part of routine decision-making.
LLMs can improve judgment when friction remains
The author notes that LLMs can improve judgment when used in ways that preserve friction, uncertainty and accountability. They can help users identify missing information, test arguments, compare alternatives, translate difficult material, improve drafts and encounter perspectives they may have missed. The difference lies in how the tool is used. If the user treats the answer as a hypothesis to examine, the system can support better reasoning, but if it is treated as a verdict, it can strengthen premature certainty.
The review identifies several conditions that make AI more likely to help. The task should be clearly defined, external standards for verification should exist and the user should have enough expertise or support to evaluate the answer. The interaction should also encourage counterarguments, source checking, uncertainty awareness and exposure to evidence that could challenge the user's first view.
On the other hand, artificial confidence is more likely when tasks are ambiguous, evidence is hard to check, users hold strong prior beliefs and the AI response is fluent but weak on provenance. The risk also rises when verification is optional, time-consuming or unsupported by the surrounding institution.
The review also introduces a related problem on the opposite side: users may sometimes reject AI help even when it could improve their judgment. Hochman frames this as another form of miscalibration. One user may trust AI too much because it feels validating. Another may distrust it too much because it feels threatening, impersonal or professionally uncomfortable. In both cases, the user's response is not properly tied to the quality of the information.
The paper argues that the goal should not be blind trust or blanket skepticism. It should be disciplined engagement. AI outputs should be treated as useful but fallible inputs that need testing against evidence, context and independent standards.
Implications and limitations for AI design and governance
The paper calls for interaction designs that keep users engaged in judgment. This could include clearer uncertainty signals, source provenance, prompts that ask users to consider counterarguments, tools that show when an answer depends heavily on the user's framing and friction points before high-stakes acceptance.
Generic warnings that AI can make mistakes are unlikely to be enough, with the review arguing that uncertainty must be tied to specific claims, not buried in broad disclaimers. Users need to know whether an answer is based on strong evidence, contested evidence, inference or speculation.
Institutions also have a role. In medicine, law, education, management, policy work and research, AI-supported judgments may need audit trails, verification protocols and clear accountability rules. The review warns that responsibility can become blurred when users, developers and organizations all treat AI as helpful while formally disclaiming full reliance on it.
The paper also calls for what it describes as judgment literacy. Users should learn not only how to prompt AI tools, but how to question them, verify them and explain why an AI-supported conclusion was accepted or rejected. In schools and workplaces, that means rewarding evidence checking and revision, not only polished final output.
The review's scope is limited because it is a conceptual synthesis, not an empirical study testing artificial confidence directly. The author draws on existing research in judgment, motivated reasoning, automation bias, processing fluency, epistemic vigilance, LLM sycophancy and systems thinking, but the central framework still needs direct testing.
The paper also focuses mainly on text-based conversational AI. Other systems, including multimodal models, autonomous agents, recommendation engines and clinical decision tools, may create related risks through different mechanisms. The author notes that user differences also matter. Expertise, epistemic humility, prior beliefs, task pressure and institutional context can all affect whether AI helps judgment or inflates confidence.
- FIRST PUBLISHED IN:
- Devdiscourse
Google News