Especially for challenging domains like user-generated content and low-resource languages?
This is the core question driving recent research in natural language processing (NLP) and machine translation (MT) by PAI’s Dr Diptesh Kanojia, who has had six publications accepted, has co-organised a shared task on quality estimation at WMT 2024, and is presenting part of this body of work at the prestigious Empirical Methods for Natural Language Processing 2024 in November. From exploring the evaluation of large language models (LLMs) for MT, the development of novel techniques for multilingual automatic post-editing and the evaluation of MT systems for emotion-laden user-generated content to research addressing the challenges of product retrieval and ranking, this work demonstrates the broad applicability of NLP, MT and Information Retrieval, in real-world scenarios.
Three of these publications are led by Shenbin Qian, PhD student at Surrey’s Centre for Translation Studies (CTS), and five are co-authored by PAI Fellow Constantin Orasan, Professor of Language and Translation Technologies at the CTS. Warm congratulations to the team on their distinguished work.
- C. Zerva, F. Blain, J. G. C. de Souza, D. Kanojia, S. Deoghare, N. M. Guerreiro, G. Attanasio, R. Rei, C. Or?san, M. Negri, M. Turchi, R. Chatterjee, P. Bhattacharyya, M. Freitag, and A. F. T. Martins, "Findings of the QE Shared Task at WMT 2024: Are LLMs Closing the Gap in QE?", In Proceedings of the Conference of Machine Translation (WMT) at EMNLP 2024.?A co-organisation of a shared task challenge for machine translation community. It is organised by researchers from many universities and industry organizations like Unbabel, Apple, and Google.?www2.statmt.org/wmt24/pdf/2024.wmt-1.3.pdf
- S. Qian, A. Sindhujan, M. Kabra, D. Kanojia, C. Or?san, T. Ranasinghe, and F. Blain, "What do Large Language Models Need for Machine Translation Evaluation?", In Proceedings of EMNLP 2024 (Main conference) [Preprint]: a group effort from researchers within PAI, CTS, Tlburg University, and Lancaster University.
- S. Deoghare, D. Kanojia, and P. Bhattacharyya, "Together We Can: Multilingual Automatic Post-Editing for Low-Resource Languages", EMNLP 2024 (Findings of EMNLP 2024) [Preprint]: Pivotal paper?on?Multilingual Automatic Post-Editing which uses a novel multi-task learning approach to build a single APE system, for automatically correcting translations in multiple low-resource languages.
- S. Qian, C. Orasan, D. Kanojia, and F. do Carmo,?"A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content", In Proceedings of the Conference of Machine Translation (WMT) at EMNLP 2024. [Preprint]: accepted at the WMT conference co-located with EMNLP 2024, and proposes a novel combined loss function for evaluating User-generated content translations.?
- S. Qian, C. Orasan, D. Kanojia, and F. do Carmo, "Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?", In Proceedings of the Workshop?on?Asian Translation (WAT) at EMNLP 2024. [Preprint]: accepted at the WAT workshop co-located with EMNLP 2024, and answers an important looming question for the MT evaluation community within NLP. It is a benchmarking study which shows novel insights into challenges with translation of UGC, even when using LLMs.
- H. Saadany, S. Bhosale, S. Agrawal, D. Kanojia, C. Orasan, and Z. Wu, "Centrality-aware Product Retrieval and Ranking", In Proceedings of EMNLP 2024. [Preprint]: an industry track paper in collaboration with eBay. This is an outcome of the eBay sponsored research at University of?Surrey?where we improve the performance of their internal foundation model by exploiting user-intent present in centrality information.
Senior Lecturer at Institute for People-Centred AI and School of CS and EE, University of Surrey | Natural Language Processing
3 个月#nlproc