Doctors Outperform ChatGPT in Fertility Counselling, Raising Limits of Medical AI

A study comparing fertility specialists with AI models found that experienced clinicians consistently provided more accurate and reliable counselling for complex fertility questions than ChatGPT or Gemini. While AI can support general information, it currently cannot replace expert clinical judgement in personalized fertility care.


CoE-EDP, VisionRICoE-EDP, VisionRI | Updated: 07-01-2026 09:29 IST | Created: 07-01-2026 09:29 IST
Doctors Outperform ChatGPT in Fertility Counselling, Raising Limits of Medical AI
Representative Image.

Researchers from leading institutions including the Edith Wolfson Medical Center and Tel Aviv University in Israel, Twig Fertility in Toronto, Shaare Zedek and Chaim Sheba Medical Centers, Dexeus Fertility in Barcelona, Ghent University, McGill University, the University of Melbourne, ANDROFERT and the State University of Campinas in Brazil, Koc University School of Medicine, ART Fertility Clinics in Dubai, and IVIRMA New Jersey set out to examine a pressing question in modern healthcare: can generative artificial intelligence models reliably counsel fertility patients facing complex medical decisions? With patients increasingly turning to AI tools such as ChatGPT and Gemini for medical advice, the study probes whether these systems can match the quality of guidance offered by experienced fertility specialists.

Why Fertility Counselling Is a Tough Test for AI

Fertility medicine involves more than relaying medical facts. Patients often seek advice on emotionally charged, uncertain, and highly individualized situations, such as whether to pursue genetic testing of embryos, how to respond to repeated IVF failure, or whether supplements or additional procedures might improve outcomes. While AI systems have shown impressive performance in exams and diagnostics, counselling requires judgement, context, and experience. The authors argue that this makes fertility care an ideal and demanding test of AI’s real-world clinical usefulness.

How the Study Compared Doctors and AI

To evaluate performance, the researchers created 12 realistic clinical scenarios based on common fertility dilemmas. Two experienced fertility specialists independently wrote responses as they would for their own patients, relying on current evidence and professional judgement. The same questions were then submitted to ChatGPT 4.0 and Gemini 2.0 using an identical prompt and a strict word limit to ensure fairness. Eight internationally recognised fertility experts, blinded to the source of each response, rated all answers on a 10-point scale based on clinical accuracy and appropriateness. This blinded design ensured that scores reflected content quality rather than reputation or writing style.

What the Results Clearly Showed

The findings strongly favoured human expertise. Responses from fertility specialists consistently received the highest scores, with median ratings of 9 and 8 out of 10. ChatGPT followed with a median score of 7, while Gemini scored substantially lower at 4.5. High-quality ratings were most common for physician-written answers and least common for Gemini. Although ChatGPT occasionally performed as well as a physician on individual questions, these cases were rare. Overall, expert clinicians were far more reliable across the full range of complex fertility scenarios. The reviewers also showed strong agreement with one another, reinforcing the robustness of the results.

What This Means for Patients and the Future of AI

The authors highlight several reasons why AI underperformed. Large language models are trained on vast but opaque datasets that may include outdated or low-quality information. They can also “hallucinate,” generating answers that sound convincing but are clinically inaccurate. Crucially, AI lacks real-time access to updated clinical guidelines and does not truly reason through patient-specific trade-offs. While it can imitate the language of medical advice, it does not possess the experiential judgement that clinicians apply in practice.

The study does not dismiss AI outright. Instead, it positions tools like ChatGPT and Gemini as potential complements to clinical care, useful for general information or patient education. However, when it comes to personalised, evidence-based fertility counselling, experienced specialists remain clearly superior. Until AI systems are better validated, more transparent, and more closely aligned with current clinical evidence, patients should not rely on them as substitutes for professional medical advice. In fertility care, where decisions carry profound emotional and medical consequences, expert human judgement still matters most.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback