Facebook makes FLORES-101 data set publicly available to break language barriers

By publicly releasing the high-quality dataset, Facebook hopes that researchers will accelerate work on multilingual translation models like M2M-100 and develop translation models in more languages, particularly in cases that do not necessarily involve English.


Devdiscourse News Desk | California | Updated: 05-06-2021 08:26 IST | Created: 05-06-2021 08:26 IST
Facebook makes FLORES-101 data set publicly available to break language barriers
FLORES-101 enables researchers to rapidly test and improve upon multilingual translation models like M2M-100. It focuses on languages such as Urdu that currently lack extensive data sets for natural language processing research. Image Credit: ANI
  • Country:
  • United States

Facebook has open-sourced FLORES-101, a many-to-many multilingual translation benchmark dataset for 101 languages, to break down language barriers and empower researchers for creating more diverse translation tools, the social networking giant said on Friday.

Machine translation helps bridge the language barriers between people and information. However, evaluating how well translation systems perform has been a major challenge for AI researchers. FLORES-101 provides the much-needed open and easily accessible way to perform high-quality, reliable measurement of many-to-many translation model performance.

FLORES-101 enables researchers to rapidly test and improve upon multilingual translation models like M2M-100. It focuses on languages such as Urdu that currently lack extensive data sets for natural language processing research.

With this tool, Researchers, for the first time, will be able to reliably measure the quality of translations through 10,100 different translation directions, for example, directly from Hindi to Thai or Swahili. The data set contains the same set of sentences across all languages, enabling researchers to evaluate the performance of any and all translation directions.

"For billions of people, especially non-English speakers, language remains a fundamental barrier to accessing information and communicating freely with other people. While there have been major advances in machine translation over the past few years, both at Facebook AI Research (FAIR) and elsewhere, a handful of languages have benefited most from these efforts. If the aim is to break down these language barriers and bring people closer together, then we must broaden our horizons," Facebook wrote in a blog post.

By publicly releasing the high-quality dataset, Facebook hopes that researchers will accelerate work on multilingual translation models like M2M-100 and develop translation models in more languages, particularly in cases that do not necessarily involve English.

I think [FLORES] is a really exciting resource to help improve the representation of many languages within the machine translation community

Graham Neubig, Professor at the Carnegie Mellon University Language Technology Institute in the School of Computer Science.

 

Give Feedback