A look back at the AI project and the issue of equity in group representation "NL-Augmenter collaborative project”

A look back at the AI project and the issue of equity in group representation "NL-Augmenter collaborative project”

Is AI socially fair??

The problematics of the fairness in groups representation is an active domain of research in Natural Language Processing (NLP), the importance of which cannot be neglected, due to the augmented reliance of human decisions on expert systems with underlying AI models. It becomes crucial in legal or business domains, as the biased predictions of the AI model might influence the gravity of the sentence given by a judge based on the ethnicity bias, aggravate gender imbalances while CV selection or negatively impact a bank credit attribution, based on age, ethnic or gender bias present in such a system. ??

How is biai introduced??

The ways the bias is introduced to an AI model vary upon stages – at the dataset construction phase, during the training phase (depending on how good the model encoded the information to the embedded space) or at the finetuning phase, where both – the finetuning dataset, as well as the model hyperparameters, influence internal model representations. If the bias was present at several stages, the bias becomes algorithmic and is hard to detect. ?

What solutions exist??

The intrinsic solutions of the bias measurement, such as PCA (Principal Component Analysis) or WEAT (Word Embedding Association Test), showed no correlation with the extrinsic methods results. The extrinsic gender bias datasets, such as Winobias, Winogender, StereoSet, CrowS-Pairs, contain the inconsistencies and are far from optimal.

Moreover, the existing solutions consider the biased datasets as a given, proposing the methods of debiasing during the pre-training phase (by hiding the information that might be source of a potential bias, such as gender or age) or the finetuning phase, where the debiasing algorithms are introduced to calculate the loss, correcting model’s latent representations learned during the training.?

Where does the NL-Augmenter?project come from??

The NL-Augmenter framework is a subproject of the global Natural Language Generation, Evaluation & Metrics (GEM) project. GEM is an international initiative, patronized by Google Research team, which brought together academic and industrial researchers, who’s focus lies in a domain of Natural Language Generation (NLG). The list of participants includes such universities as Harvard University, Stanford University, Allen institute for AI or Georgia Tech, while the list of organisations comprises and not limited to the IBM Research, Microsoft and Huggingface.?

As a PhD degree holder and a researcher in the field of conditional language generation, I’ve already participated to the GEM challenge of the first phase, testing the performance of non-autoregressive conditional generative language models on proposed datasets, which resulted in a workshop participation under the umbrella of ACL 21’ conference - a key event in the field, along with Soph.I.A. Summit 21’ participation, highlighted by ActuIA magazine. The NL-Augmenter emerged during the second phase of a GEM, having enlarged the scope of the project by proposing practical tools for datasets filtering and enhancement.?

Solution proposed by Inetum?

In view of the bias problem, it was important to propose a viable solution for bias and groups inequity detection at the dataset construction phase as one of the tools NL-Augmenter exposes.

While proposing a “plug and play” quadrilingual gender bias filter, which detects gender imbalance in English, French, Polish and Russian languages according to five categories (personal pronouns, words defining the relation, titles and names), I also added a language-agnostic universal bias filter, which allows the user to define lexical seeds in a language of preference. ?

A more sophisticated bilingual (for English and French) groups inequity filter helps to discover potential discrimination issues in the text corpus. The extrinsic nature of these filters eliminates the interpretability problem, as the explicit categorisation helps in understanding of precise lexical seeds for which the dataset utterance might be flagged as biased. ?

?? https://www.actuia.com/contribution/isabelle-galy/soph-i-a-summit-des-recherches-avancees-pour-ameliorer-lia/

Their API flexibility allows bias categories extension and lexical seeds addition, to better suit user needs. This contribution is highlighted in a common scientific article with other NL-Augmenter contributors, which has been submitted to a prestigious NLP conference.

Writting by Anna Shvets, Researcher, Deep Learning Engineer


要查看或添加评论,请登录

FabLab by Inetum的更多文章

社区洞察

其他会员也浏览了