Made it home from a great time at the Conference of the European Chapter of the Association for Computational Linguistics (#EACL2024) in Malta. Another full day in the air—ugh—but it was worth it! Such a well-organised conference and an intellectually stimulating experience. It was a pleasure to present our paper about Aboriginal and Torres Strait Islander languages and meet many smart people doing good work in natural language processing (NLP). Below are a couple of papers I enjoyed from each of the conference’s three days, with a focus on multilingualism and ethics in NLP:
- "Centering the Speech Community" by
Steven Bird
and Dean Yibarbuk was a real highlight, and I was thrilled to present our paper right after Steven in a session on multilingualism. It has a narrative style that shows the evolving relationship between the authors (and the local community) over five years as they navigated differing understandings of language. In the discussion, they note that the NLP community often resists approaches that don't work across many languages. However, this paper challenges such priorities by demonstrating the benefits of a different focus – how language technologies can enhance local agency and knowledge sharing when developed with the community's needs at the forefront.
- "Code-Switched Language Identification is Harder Than You Think" by
Laurie Burchell
and others tackles the challenge of language identification (LID) for code-switched text—when multiple languages are mixed within the same utterance. They reformulate LID as a multi-label task (meaning each text can have multiple language labels) and compare models on code-switched datasets spanning up to eight languages, finding that even the best models struggle to recognise all languages present. I found this paper interesting for highlighting the gaps between LID performance in constrained settings vs. realistic code-switched data.
- "AnthroScore: A Computational Linguistic Measure of Anthropomorphism" by
Myra Cheng
and others introduces an automatic metric to quantify how much language anthropomorphises an entity by comparing the probability of human vs. non-human pronoun references based on context. By applying their metric to research papers, they found a steady increase in anthropomorphic language over time, especially for language and multimodal models. They also found higher anthropomorphism in news than in research papers, but I checked with Myra after the presentation, and they didn’t find any change in anthropomorphic language in news about AI over time. I’ve been thinking about what their findings mean for the increasingly public debate about AI, and our responsibilities around language as researchers.
- This paper presented by
Jason Weston
"Leveraging Implicit Feedback from Deployment Data in Dialogue" explored improving conversational AI models by learning from human-bot conversations, without explicit human annotations. The key idea is to extract implicit signals of conversation quality from user behaviours, like response length or sentiment. I have been looking into external feedback signals (e.g., thumbs up/down, open-text comments), but this paper taps into the potential of user interaction data to refine dialogue models.
- In "Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models",
Natalie Shapira
and others conducted an extensive evaluation of 6 tasks to investigate the extent of large language models' (LLMs) theory of mind (ToM) abilities. While LLMs exhibited certain ToM capabilities, the authors found that this behaviour was far from robust, with LLMs struggling on adversarial examples, indicating reliance on "shallow heuristics" rather than genuine ToM abilities. This paper also reinforced that we should be cautious about using human psychological tests to evaluate AI, as the consequences don't straightforwardly transfer.
- In "Examining Gender and Racial Bias in Large Vision–Language Models Using a Novel Dataset of Parallel Images",
Kathleen C. Fraser
and
Svetlana Kiritchenko
examine how multimodal AI models handle images of people of different genders and races. First, they created a dataset containing sets of AI-generated images of people that varied only in the depicted person's gender and skin tone. Then, they queried the models in a series of experiments to describe the images in the dataset. All models they tested demonstrated biases, particularly associating men with stereotypically masculine occupations and Black individuals with crime. Evidently, the risk of perpetuating stereotypes in emerging AI systems is still very real!
There were plenty more interesting papers, a couple of keynotes, and a fantastic two-day workshop on computational methods for endangered languages to finish the week. But the real star of the week was Malta itself—crystal blue waters, historic forts, friendly people, and mad hikes. A big thank you to
Ben Hutchinson
and
Google Research
for making my attendance possible.
Lead - Healthy Connections by Curtin, Pilbara Health Challenge | Senior Lecturer in Computing | Academic Lead, Innovation Central Perth
6 个月Ned Cooper we found your paper "It's how you do things that matters..." incredibly insightful and will make sure our focus is on many of the approaches you suggest. My team is looking forward to talking with you and Ben Hutchinson on our joint interests. Tristan Carlisle Alastair Kho Prasanna Asokan
AI + Discrimination Researcher | Lawyer | Consultant
6 个月Thanks for sharing these papers!
Technology, AI, and Policy Researcher | PhD Candidate | Casual Academic
6 个月Such important work!
Congratulations Ned! Sounds like you were in a really supportive, beautiful, and interesting place ??
Responsible AI & Society Researcher | Former UW Board Member | Ex-Google, Apple, Microsoft, Meta | Innovator, Consultant, Pilot, Investor
6 个月It sounds like you had an amazing time! I really wanted to attend in Malta, but things didn't line up as expected. Thanks for sharing highlights of your time there. We should catch up soon!