登录查看更多内容

The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

发布日期: 2021年1月16日

+ 关注

The goal of this paper, by Facebook AI, is to improve cross-lingual language understanding (XLU).

Previously, we discussed multilingual BERT (M-BERT)

XLM-R: For the first time, it is possible to have a single large model for all languages, without sacrificing per-language performance!

Our best model XLM-RoBERTa (XLM-R) outperforms mBERT on cross-lingual classification by up to 23% accuracy on low-resource languages

XLM-R is very competitive with strong monolingual models.

XLM-R is a transformer-based multilingual masked language model (MLM) pre-trained on text in 100 languages!

XLM-R achieves state-of-the-art performance on cross-lingual classification, sequence labeling and question answering.

The magic of Cross-lingual Transfer: fine-tune the model using task-specific supervised training data from one language, and evaluate that task in a different language

1) Data and Low-resource languages

Pre-training text representations have led to significant improvements in many areas of natural language processing. The quality of these models benefits greatly from the size of the pretraining corpora as long as its quality is preserved.

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data, an automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages is proposed.

XLM-R model is trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages

List of languages in the CC-100 corpus created for training XLM-R, including statistics such as the number of tokens and the size of each monolingual corpus.

2) Models

List of multiple monolingual and multilingual models used by the research community and summarize their architectures and the total number of parameters.

L: number of layers; Hm: number of hidden states; Hff: the dimension of the feed-forward; A: number of attention heads; V: vocabulary size;

3) Evaluation

Cross-lingual Natural Language Inference (XNLI) for machine translation.
Named Entity Recognition (CoNLL-2002 (Sang, 2002) and CoNLL2003)
Cross-lingual Question Answering, MLQA benchmark
GLUE Benchmark. evaluate the English performance on the GLUE benchmark for multiple classification tasks, such as MNLI, SST-2, or QNLI .

4) Analysis and Results

Performance vs the number of languages. Initially, as we go from 7 to 15 languages, the model is able to take advantage of positive transfer which improves performance, especially on low resource languages. Beyond this point, the curse of multilinguality kicks in and degrades performance across all languages.
Adding more capacity to the model alleviates the curse of multilinguality, but remains an issue for models of moderate size High-resource vs Low-resource Trade-off.
Wikipedia versus CommonCrawl: models obtain significantly better performance when trained on CC, in particular on low-resource languages.
A larger batch size gives better accuracy

multilingual models can outperform their monolingual BERT counterparts.

5) Representation Learning for Low-resource Languages

mBERT and XLM-100 rely heavily on cross-lingual transfer but do not model the low-resource languages as well as XLM-R

6) Cross-lingual Transfer

As shown, when the multilingual model is fine-tuned on the English training set, the results for other languages are enhanced (Arabic for example).

The magic of Cross-lingual Transfer: Fine-tune the XLM-R on language_1 (ex: English) and test it on language_2 (ex: Arabic), without any fine-tuining on language_2!

7) Conclusion

This work exposes the surprising effectiveness of multilingual models over monolingual models, and shows strong improvements on low-resource languages

8) Code is easy!

https://github.com/pytorch/fairseq/tree/master/examples/xlmr

https://huggingface.co/transformers/model_doc/xlmroberta.html

https://youtu.be/mGdg_iPoXTs?t=724

Regards

要查看或添加评论，请登录

查看全部

The magic of XLM-R: Unsupervised Cross-lingual Representation Learning at Scale

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

1) Data and Low-resource languages

2) Models

3) Evaluation

4) Analysis and Results

5) Representation Learning for Low-resource Languages

6) Cross-lingual Transfer

7) Conclusion

8) Code is easy!

更多精彩文章

社区洞察

其他会员也浏览了

Multimodal Large Language Models (LLMs): From data management to training

Evaluating Large Language Models (LLMs): A Standard Set of Metrics for Accurate Assessment

Comparison of Key Large Language Models (LLMs): GPT-4, LLaMA, Claude 3 and PaLM

Finetuning Large Language Models: A Comprehensive Guide

Top 20 Large Language Models of 2024

Data Annotation for Large Language Models (LLMs): Top 5 Tools and Team-Building Strategies

Top 7 AI Startups Revolutionizing Data Science and Analytics

FuturProof #236: AI Technical Review (Part 8) - Pre-Training

Large Language Models (LLMs): Capabilities, Limitations, and Future Directions

Introduction, Features, and Applications of Chatgpt4

1) Data and Low-resource languages

2) Models

3) Evaluation

4) Analysis and Results

5) Representation Learning for Low-resource Languages

6) Cross-lingual Transfer

7) Conclusion

8) Code is easy!

How to Learn Artificial Intelligence: A Beginner’s Guide

2024年5月31日

[????????????] ?????????????????? ???????????? explained with code ??

2023年1月28日

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

2023年1月21日

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

2022年2月17日

FNet: Do we need the attention layer at all? [Explained with code]

2021年10月30日

Patches Are All You Need! [with code]

2021年10月28日

MLP is all you need! [with code]

2021年10月23日

9 Steps for solving any machine learning problem

2021年8月28日

Anatomy of the Beast with many heads! [with code]

2021年6月12日

How multilingual is Multilingual BERT?

2021年1月11日

社区洞察

其他会员也浏览了

Multimodal Large Language Models (LLMs): From data management to training

Evaluating Large Language Models (LLMs): A Standard Set of Metrics for Accurate Assessment

Comparison of Key Large Language Models (LLMs): GPT-4, LLaMA, Claude 3 and PaLM

Finetuning Large Language Models: A Comprehensive Guide

Top 20 Large Language Models of 2024

Data Annotation for Large Language Models (LLMs): Top 5 Tools and Team-Building Strategies

Top 7 AI Startups Revolutionizing Data Science and Analytics

FuturProof #236: AI Technical Review (Part 8) - Pre-Training

Large Language Models (LLMs): Capabilities, Limitations, and Future Directions

Introduction, Features, and Applications of Chatgpt4