The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale
Ibrahim Sobh - PhD
?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer
The goal of this paper, by Facebook AI, is to improve cross-lingual language understanding (XLU).
Previously, we discussed multilingual BERT (M-BERT)
XLM-R: For the first time, it is possible to have a single large model for all languages, without sacrificing per-language performance!
Our best model XLM-RoBERTa (XLM-R) outperforms mBERT on cross-lingual classification by up to 23% accuracy on low-resource languages
XLM-R is very competitive with strong monolingual models.
XLM-R is a transformer-based multilingual masked language model (MLM) pre-trained on text in 100 languages!
XLM-R achieves state-of-the-art performance on cross-lingual classification, sequence labeling and question answering.
The magic of Cross-lingual Transfer: fine-tune the model using task-specific supervised training data from one language, and evaluate that task in a different language
1) Data and Low-resource languages
Pre-training text representations have led to significant improvements in many areas of natural language processing. The quality of these models benefits greatly from the size of the pretraining corpora as long as its quality is preserved.
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, an automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages is proposed.
XLM-R model is trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages
List of languages in the CC-100 corpus created for training XLM-R, including statistics such as the number of tokens and the size of each monolingual corpus.
2) Models
List of multiple monolingual and multilingual models used by the research community and summarize their architectures and the total number of parameters.
L: number of layers; Hm: number of hidden states; Hff: the dimension of the feed-forward; A: number of attention heads; V: vocabulary size;
3) Evaluation
- Cross-lingual Natural Language Inference (XNLI) for machine translation.
- Named Entity Recognition (CoNLL-2002 (Sang, 2002) and CoNLL2003)
- Cross-lingual Question Answering, MLQA benchmark
- GLUE Benchmark. evaluate the English performance on the GLUE benchmark for multiple classification tasks, such as MNLI, SST-2, or QNLI .
4) Analysis and Results
- Performance vs the number of languages. Initially, as we go from 7 to 15 languages, the model is able to take advantage of positive transfer which improves performance, especially on low resource languages. Beyond this point, the curse of multilinguality kicks in and degrades performance across all languages.
- Adding more capacity to the model alleviates the curse of multilinguality, but remains an issue for models of moderate size High-resource vs Low-resource Trade-off.
- Wikipedia versus CommonCrawl: models obtain significantly better performance when trained on CC, in particular on low-resource languages.
- A larger batch size gives better accuracy
multilingual models can outperform their monolingual BERT counterparts.
5) Representation Learning for Low-resource Languages
mBERT and XLM-100 rely heavily on cross-lingual transfer but do not model the low-resource languages as well as XLM-R
6) Cross-lingual Transfer
As shown, when the multilingual model is fine-tuned on the English training set, the results for other languages are enhanced (Arabic for example).
The magic of Cross-lingual Transfer: Fine-tune the XLM-R on language_1 (ex: English) and test it on language_2 (ex: Arabic), without any fine-tuining on language_2!
7) Conclusion
This work exposes the surprising effectiveness of multilingual models over monolingual models, and shows strong improvements on low-resource languages
8) Code is easy!
https://github.com/pytorch/fairseq/tree/master/examples/xlmr
https://huggingface.co/transformers/model_doc/xlmroberta.html
https://youtu.be/mGdg_iPoXTs?t=724
Regards