New AI model harmonizes medical data across institutions without compromising privacy
One of the biggest obstacles in multi-institutional EHR research is the inconsistency in medical coding systems across healthcare facilities. Different institutions use unique local codes for laboratory tests, diagnoses, and medications, making data integration difficult. The GAME algorithm tackles this challenge by utilizing knowledge graphs, pretrained language models, and graph attention networks to map local codes to standardized medical terminologies such as ICD, LOINC, and RxNorm.
Electronic Health Records (EHRs) have become a fundamental resource for clinical and translational research, providing vast amounts of patient data for large-scale studies. However, conducting multi-institutional studies using EHR data remains a challenge due to data heterogeneity and privacy concerns.
A recent study, "Representation Learning to Advance Multi-Institutional Studies with Electronic Health Record Data" by Doudou Zhou, Han Tong, Linshanshan Wang, Suqi Liu, and others, published in arXiv (2025), introduces an innovative method called GAME (Graph Alignment for Multi-institutional EHR Data). This algorithm leverages representation learning and federated learning techniques to harmonize EHR data across institutions without sharing patient-level information, opening new frontiers for multi-center research.
Addressing data heterogeneity in multi-institutional EHR research
One of the biggest obstacles in multi-institutional EHR research is the inconsistency in medical coding systems across healthcare facilities. Different institutions use unique local codes for laboratory tests, diagnoses, and medications, making data integration difficult. The GAME algorithm tackles this challenge by utilizing knowledge graphs, pretrained language models, and graph attention networks to map local codes to standardized medical terminologies such as ICD, LOINC, and RxNorm.
GAME operates in three levels of integration: (1) within institutions, it constructs knowledge graphs to establish relationships between local codes and standard codes; (2) between institutions, it leverages language models to identify relationships across different coding systems; and (3) it applies graph attention networks (GATs) to quantify the strength of these relationships. By jointly training embeddings through federated learning, GAME enables robust code translation across institutions while preserving patient privacy.
Preserving privacy with dederated learning
Traditional multi-institutional collaborations require the sharing of patient-level EHR data, which raises significant privacy and compliance concerns. Federated learning offers a solution by enabling institutions to train models on local data and share only aggregated parameters rather than raw patient information. GAME incorporates federated learning techniques to create jointly trained embeddings without exposing individual patient records.
The study demonstrates the effectiveness of GAME in a multi-institutional setting by testing it across seven healthcare institutions in the United States and France. The researchers applied the algorithm to patient stratification in various conditions, including heart failure, rheumatoid arthritis, Alzheimer's disease, and suicide risk assessment, proving its capability to maintain data security while enhancing predictive modeling accuracy.
Enhancing clinical research through AI-driven data integration
Beyond harmonizing EHR data, GAME significantly improves the quality of multi-institutional studies by facilitating AI-driven feature selection and predictive analytics. The researchers evaluated the algorithm's performance by applying it to clinical studies on Alzheimer's disease outcomes and suicide risk among patients with mental health disorders. The results demonstrated that GAME-processed EHR data retained valuable clinical information, enabling accurate AI-driven patient stratification.
For Alzheimer's disease, the study used GAME embeddings to cluster patients based on clinical profiles and predict nursing home admissions, a key indicator of disease progression. Similarly, in suicide risk assessment, GAME embeddings allowed the identification of high-risk patient subgroups, demonstrating its potential in predictive healthcare applications. These findings indicate that GAME can enhance precision medicine efforts by enabling more accurate patient subgroup identification across diverse healthcare settings.
Future of multi-institutional EHR studies
The GAME algorithm represents a significant advancement in multi-institutional EHR research, offering a scalable, privacy-preserving, and interpretable approach to data harmonization. By combining representation learning, knowledge graphs, and federated learning, it provides a robust framework for integrating, analyzing, and interpreting heterogeneous EHR data across institutions.
As healthcare data continues to grow in complexity, AI-driven solutions like GAME will play an increasingly important role in breaking down data silos and enabling collaborative, large-scale medical research. Future research will focus on expanding GAME's applicability to new disease areas, refining its AI models, and integrating additional medical ontologies to further improve data standardization.
With GAME, the potential for conducting scalable, privacy-conscious, and high-quality multi-institutional EHR studies becomes a reality, paving the way for more comprehensive, data-driven insights in medical research and patient care.
- FIRST PUBLISHED IN:
- Devdiscourse
Google News