Beyond the doctor’s office: Can AI assist patients in making informed medical decisions?
One of the key strengths of AI models was their ability to provide concise, easy-to-understand explanations of medical terms. This aligns with their training on large medical datasets. However, their limitations became evident in more complex patient interactions. For instance, while a radiologist naturally contextualized medical findings, AI models often provided generic, textbook-style explanations without fully addressing the patient’s concerns.
Artificial Intelligence (AI) has been making significant strides in the healthcare industry, from assisting in diagnostics to managing patient data. However, while AI models like ChatGPT and Claude have demonstrated strong performance on medical knowledge assessments, their effectiveness in supporting real-world patient interactions remains uncertain. Patients and caregivers require not just accurate medical information but also empathetic, clear, and task-relevant guidance that helps them understand diagnoses, prognoses, and treatment options.
A recent study titled “Can Generative AI Support Patients’ & Caregivers’ Informational Needs? Towards Task-Centric Evaluation of AI Systems”, conducted by Shreya Rajagopal, Jae Ho Sohn, Hari Subramonyam, and Shiwali Mohan, published in Joint Proceedings of the ACM IUI Workshops (2025), explores this crucial question. The researchers analyzed the ability of two state-of-the-art generative AI models - ChatGPT-4o and Claude 3.5 Sonnet - to assist patients and caregivers in understanding chest CT scans and radiology reports. Their findings highlight the strengths and limitations of AI in addressing real-world healthcare information needs.
Evaluating AI in a real-world patient care scenario
The study begins by acknowledging a fundamental problem in healthcare: patients and caregivers often struggle to make sense of complex medical reports. While direct consultations with physicians are the gold standard, they are time-consuming and not always accessible. Many patients resort to search engines and online forums, which often provide fragmented or misleading information. AI-driven chatbots have the potential to fill this gap - but can they do so reliably?
To evaluate this, the researchers designed a task-centric study where participants, acting as caregivers, were presented with a chest CT scan and an associated radiology report of a fictitious patient. They then engaged in a real-time conversation with a radiologist to clarify key concerns. Through thematic analysis, the study identified 10 major themes that emerged from these patient-radiologist interactions, including:
- Clarifying medical terminology
- Locating issues in the CT scan
- Understanding disease prognosis
- Comparing treatment options
- Discussing diagnostic follow-up plans
Using these themes, the researchers tested ChatGPT-4o and Claude 3.5 Sonnet, evaluating their ability to provide accurate, relevant, and empathetic responses to patient queries.
AI performance: Strengths and weaknesses in patient communication
The study found significant variability in the quality of AI-generated responses across different themes. While both AI models performed reasonably well in defining medical terminology, they struggled in areas requiring visual understanding and nuanced reasoning - such as helping patients locate specific abnormalities in a CT scan.
One of the key strengths of AI models was their ability to provide concise, easy-to-understand explanations of medical terms. This aligns with their training on large medical datasets. However, their limitations became evident in more complex patient interactions. For instance, while a radiologist naturally contextualized medical findings, AI models often provided generic, textbook-style explanations without fully addressing the patient’s concerns.
Additionally, the accuracy of AI-generated responses was inconsistent. The study revealed error rates of 20% for ChatGPT-4o and 40% for Claude 3.5 Sonnet, meaning that a significant portion of AI-generated answers contained incorrect or misleading information. This is particularly concerning, as patients often lack the medical expertise to distinguish factual responses from AI hallucinations.
Another key finding was the AI models’ tendency to generate excessive, superfluous text - often including irrelevant elaborations that did not directly address the patient’s question. Unlike human radiologists, who provided concise yet informative responses, AI-generated answers often overwhelmed users with too much information or failed to focus on the core concerns.
Challenges and the future of AI in patient-facing healthcare applications
The findings of this study underscore the challenges of deploying generative AI in patient-facing applications. While AI models are improving, they are not yet reliable replacements for direct human consultation. There are three key areas where further research and development are needed:
First, AI models must improve their ability to process multimodal data - particularly in medical imaging. The study found that AI struggled to integrate visual and textual information, making it difficult for patients to locate and understand issues in their CT scans. Future AI systems need enhanced multimodal reasoning capabilities to provide better explanations for image-based medical reports.
Second, AI-generated medical information must be more structured and context-aware. Unlike radiologists, who tailor explanations based on a patient’s concerns, AI often relies on predefined templates. The study suggests that AI models should be designed to engage in dynamic, interactive conversations - adapting their responses based on real-time user feedback.
Third, AI safety and trustworthiness must be improved. Given the high error rates observed in AI-generated responses, there is a significant risk that patients may receive misleading or incorrect medical advice. Future AI systems must incorporate more rigorous fact-checking mechanisms, real-time validation against trusted medical databases, and better alignment with expert human knowledge.
The road ahead: AI as a complement, not a replacement, in healthcare
Despite these challenges, the study highlights the potential of AI as a complementary tool in healthcare. Rather than replacing medical professionals, AI can serve as a bridge between patients and doctors, offering preliminary information that helps patients prepare for consultations. By providing clear explanations of medical terminology, summarizing treatment options, and answering basic patient questions, AI can enhance health literacy and reduce patient anxiety.
However, for AI to be safely integrated into healthcare, developers, medical professionals, and policymakers must work together to establish standards for AI accuracy, reliability, and ethical use. AI systems must be designed to augment, not replace, human expertise, ensuring that patients receive accurate, personalized, and context-aware medical information.
This research represents an important step in evaluating AI through a patient-centric lens - shifting from traditional AI performance benchmarks to real-world usability and reliability. As generative AI continues to evolve, future studies must focus on developing AI models that align with human expertise, prioritize safety, and truly address the informational needs of patients and caregivers.
- FIRST PUBLISHED IN:
- Devdiscourse

