Generative AI agents take on the challenge of protecting patient privacy

Electronic health records (EHRs) are invaluable for medical research, but their use is hampered by strict privacy requirements. Traditional pseudonymization protects patients but still leaves traces that can be re-identified. Van Velzen and colleagues argue that fully synthetic text, created by large language models (LLMs) under controlled conditions, provides a stronger safeguard.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 18-09-2025 23:31 IST | Created: 18-09-2025 23:31 IST
Generative AI agents take on the challenge of protecting patient privacy
Representative Image. Credit: ChatGPT

A team of researchers has developed a novel approach to generating synthetic clinical documentation. Their method seeks to balance privacy with linguistic and informational fidelity, offering a pathway toward more secure and usable medical records for research and healthcare innovation.

The paper, Privacy-, linguistic-, and information-preserving synthesis of clinical documentation through generative agents, introduces a generative agent-based protocol capable of producing synthetic health records that retain the structure and content of authentic clinical notes without exposing sensitive patient data. Their research is published in Frontiers in Artificial Intelligence.

How can clinical text be made privacy-preserving yet useful?

Electronic health records (EHRs) are invaluable for medical research, but their use is hampered by strict privacy requirements. Traditional pseudonymization protects patients but still leaves traces that can be re-identified. Van Velzen and colleagues argue that fully synthetic text, created by large language models (LLMs) under controlled conditions, provides a stronger safeguard.

The proposed workflow uses a multi-agent system where supervisor and worker agents interact through multi-turn dialogues. The supervisor agent, modeled on an experienced clinician, guides the process, while the worker agent drafts synthetic records. Retrieval-augmented generation grounds the agents in real but anonymized examples and clinical guidelines, reducing the risk of fabrication.

To ensure transparency, the authors assert that the workflow is no-code and modular. Clinicians and researchers without AI expertise can implement it, adapt it to different contexts, and audit its performance. A human-in-the-loop design ensures oversight, giving medical professionals the ability to check outputs and maintain trust in the process.

What did the researchers find when comparing synthetic and real notes?

The team tested their method on a small proof-of-concept dataset of Dutch clinical notes on lower back pain. They compared pseudonymized records with synthetic ones using a range of structural, linguistic, and semantic metrics.

The analysis revealed several differences. Synthetic records were on average about 30 percent shorter than pseudonymized notes, suggesting less redundancy but also indicating potential information loss. Word-level diversity was higher in synthetic notes, pointing to richer vocabulary, yet character-level diversity was lower, reflecting more uniform phrasing.

The team also assessed associations between words. Synthetic notes had weaker bigram connections, meaning they contained fewer conventional word pairings typically seen in clinical language. Semantic comparisons using BLEU, BERTScore, and divergence measures confirmed that while synthetic notes resembled real records, they remained distinguishable.

A classification test further underscored this point. A machine learning model trained to distinguish between synthetic and pseudonymized records achieved perfect separation, with performance metrics of 1.0 for both AUC and AUPRC. While this result highlights limitations in the current generation method, it also provides a clear benchmark for improving realism in future iterations.

What are the implications for healthcare research and practice?

According to the authors, synthetic text generation must not only preserve privacy but also retain clinical utility. Synthetic records that deviate too far from real-world language could hinder research, while records that too closely resemble real ones risk privacy breaches. The challenge lies in balancing these competing demands.

The workflow offers several advantages. By releasing the code publicly, the team enables transparency and reproducibility. The modular design allows adaptation to different clinical domains, while the inclusion of human oversight ensures that outputs remain clinically grounded. The approach also opens the door to wider use of EHR data for training AI models, testing clinical decision-support tools, and conducting large-scale studies without breaching privacy regulations.

However, the results also show that synthetic records remain separable from real ones, at least in their current form. The authors acknowledge that more work is needed to refine linguistic patterns, improve naturalness, and close the gap in realism. They emphasize that evaluation metrics must go beyond surface similarity, incorporating measures of clinical accuracy and utility.

In addition, the proof-of-concept was limited to a small dataset. Scaling the method to larger, more diverse corpora will be necessary to validate its effectiveness across different medical conditions and healthcare systems. Future research should also investigate how synthetic text interacts with machine learning pipelines, particularly in high-stakes applications like diagnosis and treatment planning.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback