AI skin cancer models built on western data may fail diverse Asian patients

Out of 3113 studies screened, only 22 met strict criteria for inclusion in this review. All included studies either developed or validated AI algorithms using datasets from Asian countries, such as China, South Korea, Japan, Taiwan, and Iran. Deep learning techniques, especially convolutional neural networks (CNNs), dominated the field, with Inception ResNetV2 among the most frequently deployed models. Many models achieved accuracy rates exceeding 90% in binary classification tasks, distinguishing benign from malignant lesions, sometimes outperforming non-specialists and, in isolated cases, even dermatology experts.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 08-04-2025 09:53 IST | Created: 08-04-2025 09:53 IST
AI skin cancer models built on western data may fail diverse Asian patients
Representative Image. Credit: ChatGPT

A new wave of artificial intelligence tools promises faster, more accurate skin cancer detection, but a first-of-its-kind systematic review suggests that many of these systems may not be built to serve Asia, where skin tones, cancer subtypes, and healthcare access diverge sharply from the Western population.

Published in Diagnostics under the title “The Use of Artificial Intelligence for Skin Cancer Detection in Asia—A Systematic Review”, the study investigates whether AI models trained primarily on Western data can deliver accurate, equitable care across diverse Asian populations.

How well does AI detect skin cancer in Asian populations?

Out of 3113 studies screened, only 22 met strict criteria for inclusion in this review. All included studies either developed or validated AI algorithms using datasets from Asian countries, such as China, South Korea, Japan, Taiwan, and Iran. Deep learning techniques, especially convolutional neural networks (CNNs), dominated the field, with Inception ResNetV2 among the most frequently deployed models. Many models achieved accuracy rates exceeding 90% in binary classification tasks, distinguishing benign from malignant lesions, sometimes outperforming non-specialists and, in isolated cases, even dermatology experts.

However, diagnostic performance varied significantly depending on the subtype of cancer, the modality used (clinical versus dermoscopic images), and whether AI was applied as a standalone diagnostic tool or as an aid. For instance, some AI models showed a decrease in accuracy when exposed to dual modalities, indicating that more data input did not always lead to better diagnostic outcomes. Others demonstrated superior sensitivity when detecting acral melanoma, a subtype more common in Asians, yet struggled with squamous cell carcinoma or rare malignancies, reflecting training set limitations.

Moreover, AI support improved the accuracy of general practitioners and dermatology trainees but had limited benefit for seasoned dermatologists. This suggests AI may be most useful as an augmentative tool in primary care or underserved settings, but not necessarily a replacement for experienced clinicians in specialized contexts.

Are current AI models truly representative of Asia’s diverse skin types and cancer profiles?

A core concern raised in the review is the lack of diversity and scale in Asian dermatology datasets. Unlike Western-developed models trained on large, public repositories like ISIC and HAM10000, heavily weighted toward lighter Fitzpatrick skin types, Asian studies relied mostly on small institutional or private datasets. This not only reduces the generalizability of AI tools but also risks introducing bias into models that are supposed to enhance healthcare equity.

The review points to a pronounced research gap: the underrepresentation of darker-skinned populations from South and Southeast Asia. Most included studies originate from East Asian countries, particularly China, South Korea, and Japan, with limited data from India, Malaysia, or Indonesia, regions with markedly different cancer prevalence and presentation patterns.

Furthermore, less than half of the reviewed studies used histopathological diagnosis as a gold standard. Others relied on clinical consensus or did not specify their reference criteria. Without standardized validation and transparent benchmarking, even high accuracy rates may not reliably reflect real-world performance.

Transfer learning, retraining Western-developed models on Asian datasets, was a common strategy, often yielding better results than locally developed, small-scale models. Yet, the authors caution that transfer learning may fail to capture the unique clinical features of Asian-specific cancer subtypes, such as acral lentiginous melanoma or pigmented basal cell carcinoma, which manifest differently and occur at less typical anatomical sites.

What needs to change to ensure AI can support equitable skin cancer diagnostics in Asia?

The study’s authors call for urgent investment in large-scale, publicly accessible Asian image databases that reflect the full spectrum of regional skin types, lesion types, and image modalities. They argue this is essential not only for training robust and generalizable AI models but also for ensuring fairness, safety, and transparency in deployment.

Beyond data limitations, practical challenges such as computational demands, training time, and inference speeds were noted. CNNs and transformer-based models offer high accuracy but require considerable processing power, which may be infeasible in low-resource settings where early diagnosis is most needed. The absence of AI models trained using Asian dermoscopic databases is particularly problematic, given the growing role of dermoscopy in early melanoma detection.

Legal and ethical concerns also loom large. AI models trained without stringent oversight or explainability mechanisms can propagate bias or generate misleading outputs. In some studies, clinician performance dropped when AI provided incorrect predictions, suggesting a risk of over-reliance among less experienced practitioners. Regulatory standards for data sharing, model auditing, and clinical accountability remain underdeveloped across much of Asia.

Lastly, the paper emphasizes that AI is not a panacea. It must complement, not replace, clinical expertise. Real-world deployment should involve rigorous prospective validation and adaptive learning, incorporating multimodal data inputs and continual feedback loops. The researchers advocate for collaboration among dermatologists, computer scientists, and public health authorities to design systems tailored to patient-facing, primary care, or specialist use cases.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback