GPTs, chatbots and machine learning drive new wave of AI clinical trial records

GPTs, chatbots and machine learning drive new wave of AI clinical trial records
Representative image. Credit: ChatGPT

Registered AI-related clinical trials are rising sharply, with China and the United States leading the global count, according to a preprint submitted on arXiv.

The study, titled Trends in AI and Human-AI Interaction in Clinical Trials – A Hybrid Human-AI Exploration, was prepared for the Workshop on Health, Wellbeing and Human-AI Interaction at the Hybrid Human-Artificial Intelligence Conference, HHAI2026. It analyzes AI-related records in ClinicalTrials.gov, the world's largest public clinical trial registry, and tests whether a hybrid workflow using a frontier generative AI model and human review can help classify how AI is used in clinical trials.

The authors searched ClinicalTrials.gov using a broad AI-focused search string covering AI, machine learning, deep learning, computer vision, natural language processing, neural networks, expert systems, chatbots, ChatGPT, GPT and large language models.

AI terminology is expanding across clinical trials

The study identifies a major shift in the language used to describe AI in clinical trials. Earlier AI-related terms, including expert systems, appear in older records, but more recent trials increasingly use terms associated with modern data-driven AI. References to artificial intelligence, AI, machine learning and deep learning have surged over the past decade, while terms tied to chatbots, GPT and large language models have grown particularly quickly since the release and public adoption of newer conversational AI systems.

The term AI returned the largest number of unique records, followed by artificial intelligence, machine learning and deep learning. Chatbot, GPT, neural network and expert system also retrieved substantial numbers of unique records, showing that no single term can capture the full range of AI-related clinical research.

This is crucial because the clinical AI field is fragmented in both technology and language. A study may describe a system as machine learning, computer vision, a chatbot, an algorithm, an expert system or a decision-support tool. In some cases, AI may be embedded inside a broader digital intervention, making it harder to determine whether AI is central to the trial or only a background component.

The paper highlights an important reporting problem. Trial registry records often do not clearly define the AI method, the role of the system, the data inputs, the users involved or the level of human-AI interaction. The lack of detail creates barriers for researchers trying to compare AI trials, assess safety and fairness, and understand how AI is being tested in real clinical settings.

The problem goes beyond AI. Similar gaps have appeared in digital health and wearable technology studies, where trial records often leave out basic details about the version tested, what the technology actually does, how well it performs and where it is meant to be used. With AI, those gaps can carry greater risk because patient outcomes depend not only on the model's accuracy, but also on how it fits into real clinical workflows.

The authors note that existing reporting guidance, including SPIRIT-AI for AI trial protocols and CONSORT-AI for completed randomized trial reports, was developed to improve transparency. However, compliance remains incomplete in published research, meaning AI trial descriptions still often lack the information needed to understand how systems interact with patients, clinicians or other users.

China and the United States lead AI trial growth

The geographical pattern of AI-related clinical trials shows clear concentration. China and the United States accounted for the largest numbers of AI-classified trials in the dataset, with each country recording roughly four times as many trials as the next leading country. Italy ranked third, followed by countries including France, Spain, the United Kingdom, Turkey, Taiwan, Germany, South Korea, India, Canada, the Netherlands, Singapore and Japan.

According to the study, the number of AI trials in China rose sharply from 2018 and overtook the United States. Both countries now dominate registered AI-related clinical research, reflecting their wider investment in AI, digital health, hospital technology and biomedical research infrastructure.

The global spread of AI trials is still widening. Several European and Asian countries showed notable increases in recent years, suggesting that AI clinical research is moving beyond early hubs into a broader international phase. Having said that, the paper notes that many records did not declare a location, which limits the precision of geographical analysis.

The dataset included both interventional and observational studies. Of the 5,828 returned records, 3,019 were interventional studies and 2,807 were observational studies, while two were expanded access records. The researchers retained observational studies because the goal was to understand AI use in clinical research records rather than restrict the analysis to a single trial design.

The paper's geographic findings add to earlier evidence that registered AI and machine learning clinical studies have increased substantially since 2010. But the authors extend that work by focusing not only on AI use, but also on human-AI interaction, an area they argue is still underdeveloped in healthcare research.

Human-AI interaction is important because AI systems in healthcare rarely operate in isolation. A diagnostic tool may guide clinicians, a chatbot may communicate with patients, a monitoring system may alert care teams, or a decision-support platform may shape treatment planning. In each case, the clinical outcome depends partly on how people receive, interpret and act on AI outputs.

The study classified possible interaction categories, including no AI use, AI use without human-AI interaction, patient-AI interaction, caregiver-AI interaction, health professional-AI interaction, other human-AI interaction and hybrid-AI interaction involving more than one type of user.

Hybrid review shows promise, but trial reporting remains a barrier

The study also examined a hybrid human-AI workflow for screening and classifying clinical trial records. The researchers used GPT-5.5 through the OpenAI API to classify trial records and provide concise explanations and confidence ratings. Human reviewers then classified a representative random sample of 100 records, with a third reviewer resolving disagreements.

The results suggest that AI-assisted screening can help identify records that do not substantively use AI. In the sample of 100 records, both human and AI classifiers identified 14 trials as not using AI, with disagreement in two cases. Those two cases involved uncertainty, and the AI system also reported lower confidence in them.

Classifying human-AI interaction was a more difficult task. Among records where AI use was accepted, there were frequent disagreements about whether interaction occurred and which human group interacted with the system. The most common disagreement involved whether a trial should be classified as health professional-AI interaction or no human-AI interaction. The uncertainty reflects the limits of trial records. Some descriptions mention an AI system but do not clarify whether clinicians directly receive AI outputs, whether AI runs in the background, whether patients interact with a tool, or whether AI only processes data outside the clinical workflow. In complex interventions, a system may affect decisions without clear documentation of who engages with it.

Human classifiers sometimes abstained when records lacked enough information, particularly when trying to categorize human-AI interaction. This points to a core challenge for both human and machine review: classification quality depends on reporting quality.

The authors also note that AI-assisted review is not cost-free. Their experimental workflow involved substantial API use and cost $75.17 for the main computation. Pilot work also used a paid ChatGPT Pro account. The paper also flags the environmental cost of AI computation, estimating that the energy use could correspond to some tens of kilowatt-hours and at least some associated carbon dioxide emissions.

The findings further point to several policy and research needs. Clinical trial records should more clearly specify whether AI is used, what type of AI is involved, whether users interact with it, who those users are, what expertise is required and how AI outputs are integrated into the intervention or workflow. Clearer interaction categories would also help reviewers compare trials and assess clinical risk.

  • FIRST PUBLISHED IN:
  • Devdiscourse

TRENDING

OPINION / BLOG / INTERVIEW

Workers see AI as helpful, but fear losing credit for their own expertise

GPTs, chatbots and machine learning drive new wave of AI clinical trial records

Universities face new sustainability test: Turning SDG talk into institutional action

Antimicrobial resistance has ancient roots, but its public health threat is growing now

DevShots

Latest News

Connect us on

LinkedIn Quora Youtube RSS
Give Feedback