AI vs misinformation: How large language models are verifying biomedical claims
Traditional biomedical claim verification methods rely on expert reviews of scientific literature, a time-consuming and resource-intensive process. AI-powered models, particularly LLMs fine-tuned for Natural Language Inference (NLI), offer a more efficient way to assess claims. These models analyze biomedical assertions against retrieved scientific studies, classifying them into three categories: “Support,” “Contradict,” or “Not Enough Information.”
In the era of rapid scientific advancements and increasing misinformation, verifying biomedical claims has become essential for healthcare decision-making, public health policies, and scientific research. False or misleading medical claims can lead to misinformed treatments, policy failures, and general distrust in medical institutions. To tackle this challenge, researchers are leveraging Large Language Models (LLMs) to create explainable, AI-driven biomedical claim verification systems that offer greater transparency, accountability, and reliability.
A recent study titled “Explainable Biomedical Claim Verification with Large Language Models” by Siting Liang and Daniel Sonntag, published in Joint Proceedings of the ACM IUI Workshops (2025), presents a novel system that integrates natural language inference (NLI), transparent AI model explanations, and user-guided justifications to improve biomedical claim verification. This system allows users to retrieve relevant scientific studies, analyze how LLMs process evidence, and verify claims with interactive, explainable AI-powered reasoning.
Role of AI in biomedical claim verification
Traditional biomedical claim verification methods rely on expert reviews of scientific literature, a time-consuming and resource-intensive process. AI-powered models, particularly LLMs fine-tuned for Natural Language Inference (NLI), offer a more efficient way to assess claims. These models analyze biomedical assertions against retrieved scientific studies, classifying them into three categories: “Support,” “Contradict,” or “Not Enough Information.”
The study introduces a Chain of Evidential Natural Language Inference (CoENLI) framework, which enhances AI reasoning by guiding LLMs to generate structured, evidence-based explanations before final classification. This framework ensures that AI-driven decisions are not only accurate but also interpretable, allowing users to trace the reasoning process behind each claim verification.
To improve accuracy, the system integrates SHAP (Shapley Additive Explanations) values, which highlight the contribution of individual words in a claim to the AI’s final decision. This transparency helps users understand why the AI reached a certain conclusion, making the verification process more trustworthy and accountable.
Evaluation and performance of the explainable AI system
The study evaluates the CoENLI framework using two biomedical benchmarks: NLI4CT (Natural Language Inference for Clinical Trials) and SciFact (Scientific Fact Verification). These datasets require AI models to process complex biomedical claims and assess their validity based on real-world clinical trials and scientific studies.
The researchers compared different AI approaches, including:
- Simple Prompting (basic claim classification with no intermediate reasoning).
- Zero-Shot Chain of Thought (CoT) (step-by-step reasoning without structured guidance).
- CoENLI (proposed method) (structured, evidence-based inference with detailed explanations).
Results showed that CoENLI significantly outperforms traditional methods, achieving:
- Higher accuracy in claim verification across both biomedical datasets.
- More consistent and interpretable justifications for AI-driven decisions.
- Improved user agreement and trust in AI-assisted claim verification.
The integration of multiple LLMs, including GPT-4o-mini, Llama 3.1, and Mistral-12B, further demonstrated the potential for balancing accuracy, efficiency, and transparency in biomedical AI systems.
Challenges and future improvements in AI-driven medical verification
While the study highlights the strengths of LLMs in claim verification, it also acknowledges key challenges:
One major issue is model interpretability and trust. While CoENLI improves explainability, some AI-generated justifications may still lack domain-specific nuance. Future research should explore fine-tuned AI models trained specifically on biomedical literature to enhance reasoning capabilities.
Another challenge is computational efficiency. The use of large-scale LLMs, such as GPT-4o-mini, demands high computational power, making deployment difficult in resource-constrained environments. To address this, researchers propose hybrid models that combine lightweight LLMs with fine-tuned medical knowledge bases.
Additionally, ensuring AI fairness and bias reduction remains a priority. The study highlights that small variations in claim wording can sometimes lead to inconsistencies in verification results. By refining training data diversity and integrating human feedback loops, AI systems can become more robust and reliable.
Future of AI-powered biomedical fact-checking
The integration of LLMs, explainable AI, and evidence-based verification represents a major leap forward in biomedical claim validation. This research not only advances AI-assisted decision-making in healthcare and research but also sets the foundation for more accountable, human-AI collaboration frameworks.
Looking ahead, the researchers propose further optimization of LLM reasoning strategies, enhanced feedback mechanisms, and integration with broader evidence synthesis frameworks. As AI technology continues to evolve, its role in fighting medical misinformation, supporting clinical research, and aiding policy decisions will become increasingly vital.
By making biomedical claim verification more transparent, explainable, and reliable, this study paves the way for trustworthy AI applications in the healthcare sector - ensuring that scientific knowledge remains accurate, credible, and accessible to all.
- FIRST PUBLISHED IN:
- Devdiscourse

