AI can balance data privacy and utility in clinical decision support systems
The study’s results show that enhanced generalization with suppression, the strongest de-identification strategy, significantly reduced re-identification risk. By merging age ranges into larger categories and suppressing records with fewer than five similar cases, the model effectively eliminated small, high-risk data clusters. This process shifted the dataset toward safer anonymity levels without drastically reducing the number of usable records.
A new study shows that artificial intelligence (AI) can preserve patient privacy without compromising clinical accuracy. The researchers have shown that privacy-preserving techniques, when applied to electronic health records (EHRs), can successfully balance data security and model performance, setting a new benchmark for safe AI deployment in healthcare.
The study, titled “Balancing Privacy and Utility in Artificial Intelligence-Based Clinical Decision Support: Empirical Evaluation Using De-Identified Electronic Health Record Data” and published in Applied Sciences evaluates how different data de-identification strategies and differential privacy methods affect both privacy protection and predictive accuracy in clinical decision support systems (CDSS).
How can hospitals use AI without risking patient privacy?
As hospitals and research institutions increasingly rely on large datasets to train clinical models, concerns about re-identification, membership inference, and model theft have become critical. Even after direct identifiers such as names or patient numbers are removed, de-identified data can still expose individuals through quasi-identifiers, combinations of variables like sex, age, and diagnosis.
To tackle these vulnerabilities, the researchers designed an integrated privacy evaluation framework that tested both traditional and modern privacy-preserving methods. They examined three de-identification strategies, baseline generalization, enhanced generalization, and enhanced generalization with suppression, alongside a machine learning defense technique known as differentially private stochastic gradient descent (DP-SGD).
The dataset consisted of 100,000 diagnostic records collected between 2016 and 2022 at a university-affiliated tertiary medical center. Each record contained 43 variables, including demographic and diagnostic data. The analysis simulated a multi-institutional scenario, with the goal of maintaining clinical relevance while limiting privacy risks. All experiments were approved by the Institutional Review Board of Wonju Severance Christian Hospital.
What happens when privacy and utility collide?
The study’s results show that enhanced generalization with suppression, the strongest de-identification strategy, significantly reduced re-identification risk. By merging age ranges into larger categories and suppressing records with fewer than five similar cases, the model effectively eliminated small, high-risk data clusters. This process shifted the dataset toward safer anonymity levels without drastically reducing the number of usable records.
At the same time, the researchers applied DP-SGD during model training. This technique introduces calibrated noise to the model’s gradient updates, ensuring that no single data point has an identifiable influence on the model’s outcome. By maintaining a privacy budget of ε = 2.488 and δ = 10⁻⁵, the model achieved a balance between privacy and performance, a relationship often described as the “privacy–utility trade-off.”
The analysis covered three major attack scenarios: re-identification risk, membership inference attacks (MIA), and model extraction attacks (MEA). The outcomes were decisive. Membership inference attacks, which test whether an individual’s data was used during training, remained at chance level under all conditions. This means that attackers could not infer a patient’s inclusion in the dataset, confirming that participation remained untraceable.
Model extraction attacks, however, revealed a different pattern. Without privacy protection, a secondary “student” model could replicate the original “victim” model’s predictions with near-perfect fidelity. Once DP-SGD was introduced, replication accuracy collapsed, proving that the noise-based privacy mechanism effectively disrupted the attack. Despite this defense, the model’s clinical accuracy remained stable, with an area under the receiver operating characteristic curve (AUROC) of 0.73 and overall accuracy of 95 percent.
The findings demonstrate that combining suppression-based de-identification with DP-SGD offers a dual layer of defense. While generalization and suppression protect data at the record level, DP-SGD guards against inference and extraction threats during model training. This layered approach preserves both data integrity and predictive reliability, ensuring that AI-based decision systems remain clinically usable.
Can AI-based clinical systems be both secure and reliable?
The implications of this research extend far beyond technical performance. The authors emphasize that privacy-preserving frameworks can enhance trust and compliance in clinical environments where patient confidentiality is paramount. They argue that while many previous studies focused on theoretical models or synthetic datasets, this work offers rare empirical evidence from real-world, multi-year EHR data.
The researchers also highlight the practical benefits of these defenses for hospitals and health institutions. Suppression-based de-identification can be applied before data sharing, minimizing re-identification risks across multi-institutional collaborations. Meanwhile, differential privacy mechanisms like DP-SGD can be integrated directly into training pipelines, protecting proprietary models from being copied or reverse-engineered. This combination establishes a new standard for responsible AI adoption in healthcare.
Moreover, the study found no significant performance drop in predictive outcomes even with strong privacy constraints. The authors note that this finding dispels the long-standing notion that privacy always comes at the cost of utility. Instead, their results demonstrate that meaningful privacy protection is possible without compromising the clinical value of AI models.
The research also sheds light on specific vulnerabilities associated with rare diagnostic subgroups, which can inadvertently increase privacy risk. Suppression of these small categories was shown to effectively neutralize such threats, though the authors caution that over-suppression may reduce data diversity. They recommend transparent reporting of suppression rates and sensitivity analyses to ensure accountability and reproducibility.
A new atandard for privacy-conscious AI in medicine
The study establishes a concrete, evidence-based foundation for balancing privacy and performance in healthcare AI. By empirically testing multiple threat scenarios, it demonstrates that a two-tier privacy model, data-level de-identification combined with training-level differential privacy, can mitigate risk without degrading diagnostic accuracy.
The researchers propose that hospitals and research centers adopt this dual framework as a minimum standard for AI-based clinical decision support systems. They argue that such safeguards not only protect patients but also preserve the proprietary value of medical AI models, which are increasingly viewed as strategic institutional assets.
The study further calls for multi-institutional validation to ensure the reproducibility of these findings in diverse healthcare environments. Future research, they suggest, should test wider ranges of privacy budgets (ε values), assess bias introduced by suppression, and evaluate additional privacy metrics such as l-diversity and t-closeness. They also advocate for integrating federated learning, where data remains within individual institutions but models are trained collaboratively, to strengthen data governance and security.
- READ MORE ON:
- AI in healthcare
- privacy-preserving artificial intelligence
- AI in clinical decision support systems (CDSS)
- electronic health records (EHR) privacy
- data security in healthcare
- how AI protects patient privacy in hospitals
- balancing data privacy and clinical accuracy
- privacy-preserving machine learning for hospitals
- FIRST PUBLISHED IN:
- Devdiscourse

