Boosting Healthcare Access in Vietnam with Fine-Tuned AI for Low-Resource Languages
Vietnamese researchers from RMIT, Hon Hai, and OUCRU successfully fine-tuned open-source AI models to enhance healthcare communication in Vietnamese, a low-resource language. Their approach significantly improved model accuracy, privacy, and accessibility, offering a scalable solution for equitable health information delivery.

A team of researchers from RMIT University in Ho Chi Minh City, Hon Hai Research Institute in Taipei, and the Oxford University Clinical Research Unit (OUCRU) has made a significant breakthrough in using artificial intelligence to bridge linguistic and technological gaps in healthcare. Their study outlines how fine-tuning Large Language Models (LLMs) can vastly improve health communication in low-resource languages such as Vietnamese. In an age where ChatGPT and similar tools dominate conversations around AI, access remains highly uneven, especially in developing countries. In Vietnam, for instance, the $20 monthly fee for premium AI services like GPT-4 is prohibitively expensive for many, considering the average monthly income is under $300. But it’s not just cost; the dominance of English in AI training datasets severely limits the performance of these models in Vietnamese, which comprises just 0.08% of LLaMA2’s training data. This linguistic bias undermines the utility of such tools in non-English-speaking contexts, particularly in specialized fields like medicine.
Building Smarter Vietnamese Models for Health
To solve this, the researchers fine-tuned three open-source LLMs—BloomZ-3B, LLaMA2–7B, and LLaMA2–13B for the Vietnamese medical domain. The fine-tuning process was comprehensive, involving the selection of base models, the compilation of a vast and relevant dataset, implementation of fine-tuning techniques like LoRA (Low-Rank Adaptation) and QLoRA (Quantized LoRA), and rigorous performance evaluation. At the core of their success was the creation of a high-quality dataset comprising over 337,000 prompt-response pairs in Vietnamese. They employed three creative strategies to gather data: translating English-language datasets (PubMedQA, HealthMagicQA) using the EnViT5 library, web-crawling trusted Vietnamese health forums like VnExpress Health and Vinmec, and generating question-answer pairs from 18 Vietnamese medical textbooks. The textbook distillation process used AI models to form questions and generate corresponding answers in Vietnamese, offering a reliable and contextually rich resource.
Dramatic Performance Gains in Evaluation
Once the dataset was compiled, each model was fine-tuned on high-performance hardware, ranging from NVIDIA RTX 3060 GPU workstations to the RMIT AWS Cloud Supercomputing Hub. The models were trained over six epochs with a learning rate of 0.001. To test the improvements, researchers adopted both quantitative and qualitative evaluation methods. Quantitatively, they used metrics such as BERTScore and ROUGE-L to assess the similarity between AI-generated responses and those given by certified Vietnamese physicians. The results were striking. The BERTScore F1 for the fine-tuned LLaMA2–13B jumped from 0.6424 to 0.8109, while its ROUGE-L F1 rose from 0.0043 to 0.2320. This means the responses became far more accurate, contextually relevant, and fluent in Vietnamese. The gains were consistent across multiple domains—cardiology, dermatology, psychology, and infectious diseases—regardless of the volume of domain-specific data. This suggests a robust and generalizable fine-tuning process that doesn’t require massive amounts of data for each subdomain to be effective.
AI Judges the AI: GPT-4 Validates Quality
In the absence of human medical experts for evaluation, the team employed a novel method known as LLM-as-a-Judge, using ChatGPT-4 to evaluate and compare outputs from the base and fine-tuned models. GPT-4 acted as an unbiased evaluator, comparing responses based on linguistic fluency, safety, factual accuracy, and appropriateness for Vietnamese users. The results mirrored the quantitative findings. For instance, 89.5% of the responses from the fine-tuned LLaMA2–13B were judged to be superior to those from its base version, while just 6.3% of the base model’s responses were considered better. Even the smaller BloomZ-3B model saw significant quality gains, with over one-third of its outputs rated better than the base version and the majority of the rest rated equally good. These findings indicate that advanced LLMs, even when trained on limited resources, can be significantly enhanced through domain-specific and language-specific fine-tuning.
A Scalable Path Toward Equitable AI in Healthcare
Beyond improving accuracy and linguistic performance, a major highlight of this research is its emphasis on data privacy and ethical AI deployment. Unlike commercial LLMs that operate via cloud servers and retain user queries, the fine-tuned models can be deployed locally, ensuring that sensitive health information remains within institutional or personal systems. This aligns with WHO's guidelines on the ethical use of AI in healthcare, especially regarding privacy and data security. Still, the team acknowledges limitations. The absence of human evaluators with Vietnamese medical expertise means that some context-specific nuances may go unaddressed. Moreover, the high costs and technical demands of fine-tuning may still be beyond the reach of many developing institutions. Nonetheless, the study provides a replicable and scalable framework for adapting LLMs to other low-resource languages and specialized domains. By tailoring AI tools to local linguistic and cultural contexts, this work paves the way for more inclusive digital healthcare systems. It also reinforces the notion that technological progress, to be truly meaningful, must serve those most often left behind.
- FIRST PUBLISHED IN:
- Devdiscourse
ALSO READ
AI Creativity: Navigating the Imaginary Intelligence of ChatGPT
LLMs revolutionize remote medication management, boosting adherence and patient support
How LLMs Improve Fairness and Visual Diversity in AI Image Generation Through Prompting
LLMs exploit sensitive users with tailored emotional manipulation
ChatGPT is quietly replacing peer support in college classrooms