AI in medicine gets a boost from synthetic data, but only the smart kind
Synthetic medical data, typically used to address privacy restrictions or to balance underrepresented cases in training data, is increasingly common in healthcare AI. However, the study warns that improper use or poor-quality generation techniques can introduce distortions, undermining model performance and producing misleading outputs.

A new peer-reviewed study has found that synthetic medical data can enhance the performance and transparency of artificial intelligence algorithms in healthcare - under specific conditions. Published in Electronics, the study evaluates how machine learning and deep neural network models perform when trained on real, synthetic, and hybrid datasets, revealing both potential benefits and risks tied to data sensitivity and explainability.
The study, titled "The Explanation and Sensitivity of AI Algorithms Supplied with Synthetic Medical Data" and conducted by researchers from Dunărea de Jos University of Galați, Romania, tests the effectiveness of synthetic data using two widely used medical datasets: the Pima Indians Diabetes Dataset (PIDD) and the Breast Cancer Wisconsin Diagnostic Dataset (BCWD). Using a range of machine learning models and custom-built neural networks, the authors compared model performance across multiple scenarios, including real-only data, synthetic-only data, and combined configurations.
Synthetic medical data, typically used to address privacy restrictions or to balance underrepresented cases in training data, is increasingly common in healthcare AI. However, the study warns that improper use or poor-quality generation techniques can introduce distortions, undermining model performance and producing misleading outputs. The researchers found that in some cases, synthetic data significantly improved classification accuracy. In others, it degraded results or misaligned model interpretations, raising questions about reliability.
For the BCWD dataset, models trained solely on real data outperformed all synthetic and hybrid variants. Random forest classifiers using the original data achieved an accuracy of 97.2%, while the same models trained on Gaussian Copula Synthesizer (GCS) data dropped to 73.4% and hybrid data to 86%. This suggested that the original dataset was already well-distributed and that artificial augmentation was unnecessary, and potentially harmful, in this context.
However, results differed when applied to the diabetes dataset. The PIDD data, known for its class imbalance, benefitted from the use of synthetic augmentation methods like the Synthetic Minority Oversampling Technique (SMOTE). When combined with the original data, SMOTE raised the classification accuracy from 78.5% to 94.2% using an Extra Trees classifier in PyCaret’s AutoML framework. Custom deep neural networks also performed better, with the more complex DNN2 model achieving 89.7% accuracy on SMOTE-augmented data compared to lower results on the original dataset alone.
To further test data sensitivity, the authors introduced synthetic features derived from discretization techniques, converting continuous values into categorical representations. These transformations improved feature salience in several configurations, further boosting model performance, especially for the diabetes classification task.
The study also incorporated LIME (Local Interpretable Model-Agnostic Explanations) to evaluate which features contributed most to the models’ decisions. In high-performing models using SMOTE or feature-enriched data, key indicators such as body mass index (BMI), insulin levels, and glucose concentration were ranked as the most influential - aligning with clinical expectations. In contrast, models built on GCS data often produced less coherent feature importance rankings, suggesting potential distribution drift or over-smoothing in the synthetic generation process.
While PyCaret’s AutoML platform was praised for its efficiency and speed, especially in early testing stages, the authors noted that carefully constructed deep neural networks offered greater flexibility and interpretability when tuned properly. The advanced DNN2 architecture, which included five hidden layers and dropout stages to prevent overfitting, demonstrated particular value when applied to hybrid datasets.
Despite these gains, the researchers cautioned that synthetic data remains a double-edged sword. Without careful selection of generation methods, integration techniques, and validation tools, AI models may inherit bias, distort clinical indicators, or fail in real-world applications. They emphasized that LIME and other explainability tools are essential for validating not just performance metrics, but the integrity of underlying decision logic.
Simply put, synthetic data, when properly calibrated and combined with real records, can improve AI performance and reduce dependence on sensitive patient information. However, data sensitivity remains a critical risk.
The authors call for future research to explore additional explainability methods, such as SHAP or counterfactual reasoning, and to incorporate larger, more diverse datasets for greater generalizability. They also recommend that AI development teams in healthcare prioritize transparency, robust validation, and clear documentation when using synthetic data, particularly in clinical contexts.
- FIRST PUBLISHED IN:
- Devdiscourse