The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale
? cristinn - Adobe Stock - 158423491

The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

The goal of this paper, by Facebook AI, is to improve cross-lingual language understanding (XLU).

Previously, we discussed multilingual BERT (M-BERT)


XLM-R: For the first time, it is possible to have a single large model for all languages, without sacrificing per-language performance!


Our best model XLM-RoBERTa (XLM-R) outperforms mBERT on cross-lingual classification by up to 23% accuracy on low-resource languages

XLM-R is very competitive with strong monolingual models.

XLM-R is a transformer-based multilingual masked language model (MLM) pre-trained on text in 100 languages!

XLM-R achieves state-of-the-art performance on cross-lingual classification, sequence labeling and question answering.

The magic of Cross-lingual Transfer: fine-tune the model using task-specific supervised training data from one language, and evaluate that task in a different language


1) Data and Low-resource languages

Pre-training text representations have led to significant improvements in many areas of natural language processing. The quality of these models benefits greatly from the size of the pretraining corpora as long as its quality is preserved.

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, an automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages is proposed.

No alt text provided for this image
XLM-R model is trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages

List of languages in the CC-100 corpus created for training XLM-R, including statistics such as the number of tokens and the size of each monolingual corpus.

No alt text provided for this image

2) Models

List of multiple monolingual and multilingual models used by the research community and summarize their architectures and the total number of parameters.

No alt text provided for this image

L: number of layers; Hm: number of hidden states; Hff: the dimension of the feed-forward; A: number of attention heads; V: vocabulary size;


3) Evaluation

  • Cross-lingual Natural Language Inference (XNLI) for machine translation.
  • Named Entity Recognition (CoNLL-2002 (Sang, 2002) and CoNLL2003)
  • Cross-lingual Question Answering, MLQA benchmark
  • GLUE Benchmark. evaluate the English performance on the GLUE benchmark for multiple classification tasks, such as MNLI, SST-2, or QNLI .

4) Analysis and Results

  • Performance vs the number of languages. Initially, as we go from 7 to 15 languages, the model is able to take advantage of positive transfer which improves performance, especially on low resource languages. Beyond this point, the curse of multilinguality kicks in and degrades performance across all languages.
  • Adding more capacity to the model alleviates the curse of multilinguality, but remains an issue for models of moderate size High-resource vs Low-resource Trade-off.
  • Wikipedia versus CommonCrawl: models obtain significantly better performance when trained on CC, in particular on low-resource languages.
  • A larger batch size gives better accuracy
multilingual models can outperform their monolingual BERT counterparts.


5) Representation Learning for Low-resource Languages

mBERT and XLM-100 rely heavily on cross-lingual transfer but do not model the low-resource languages as well as XLM-R


6) Cross-lingual Transfer

As shown, when the multilingual model is fine-tuned on the English training set, the results for other languages are enhanced (Arabic for example).

No alt text provided for this image
The magic of Cross-lingual Transfer: Fine-tune the XLM-R on language_1 (ex: English) and test it on language_2 (ex: Arabic), without any fine-tuining on language_2!


7) Conclusion

This work exposes the surprising effectiveness of multilingual models over monolingual models, and shows strong improvements on low-resource languages


8) Code is easy!

https://github.com/pytorch/fairseq/tree/master/examples/xlmr

https://huggingface.co/transformers/model_doc/xlmroberta.html

https://youtu.be/mGdg_iPoXTs?t=724

Regards

要查看或添加评论,请登录

社区洞察

其他会员也浏览了