Insights from ACL 2024 Bangkok: Advancing AI, LLMs and NLP
The 3rd Language Summit on Project SEALD: A True SEA Story

Insights from ACL 2024 Bangkok: Advancing AI, LLMs and NLP

When developing AI products like?Dira,?much of our time is dedicated to collaborating with internal and external clients, gathering product requirements, collecting data and training models, and integrating them with our back-end services and front-end interfaces.

Attending the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) in Bangkok provided a valuable opportunity to learn from experts in the field and explore new methodologies and applications.


The conference featured numerous engaging workshops, tutorials, and presentations on various topics.

Overview

Prof. Subbarao Kambhampati from 美国亚利桑那州立大学 discussed the question, “Can LLMs Reason and Plan?” He concluded that while LLMs struggle with planning, they can assist in planning when used in LLM-Modulo frameworks alongside external verifiers and solvers. For example, projects like AlphaProof and AlphaGeometry leverage fine-tuned LLMs to enhance the accuracy of predictions.

https://x.com/rao2z/status/1733311474716885423

Prof. Barbara Plank from LMU Munich delivered another notable keynote titled “Are LLMs Narrowing Our Horizon? Let’s Embrace Variation in NLP!” Prof. Plank addressed current challenges in NLP, such as biases, robustness, and explainability, and advocated for embracing variation in inputs, outputs, and research to rebuild trust in LLMs. She pointed out that despite the power gained through advances like deep learning, trust has diminished due to issues such as bias. Prof. Plank suggested that understanding uncertainty and embracing variation—especially in model inputs and outputs—is key to developing more trustworthy NLP systems.

Trust arises from knowledge of origin as well as from knowledge of functional capacity

Another highlight was the “Challenges and Opportunities with SEA LLMs” panel. Chaired by Lun-Wei Ku , it featured insights from experts like Prof. Sarana Nutanong from VISTEC, Prof. Ayu Purwarianti from ITB Indonesia, and William Tjhi from AI Singapore . They discussed the development of LLMs in Southeast Asia, emphasizing the importance of quality data collection and annotation for regional languages.

Detailed Insights

Prof. Subbarao Kambhampati from Arizona State University discussed the topic “Can LLMs Reason and Plan?” The takeaway: LLMs struggle with planning, but...

To illustrate, consider this scenario: “If block C is on top of block A and block B is separately on the table, how can you create a stack with block A on top of block B and block B on top of block C without moving block C?”

Even though this is impossible, ChatGPT 4o mistakenly attempts to comply, moving block C in the first step.

ChatGPT 4o mistakenly attempts to comply, moving block C in the first step.

This conclusion is backed by in-depth research, such as the study titled “ON THE PLANNING ABILITIES OF LARGE LANGUAGE MODELS,” supported by recent statistics.

Results on PlanBench

The silver lining is that while LLMs aren’t proficient at planning, they can assist with planning when integrated into LLM-Modulo frameworks and used alongside external verifiers and solvers.

LLMs as Idea Generators

For instance, AlphaProof and AlphaGeometry utilize fine-tuned LLMs to enhance the accuracy of their predictions, as detailed here.

Another noteworthy presentation was given by Prof. Barbara Plank , Professor of AI and Computational Linguistics at LMU Munich, titled “Are LLMs Narrowing Our Horizon? Let’s Embrace Variation in NLP!”

Prof. Plank highlighted current challenges that have contributed to a decline in trust in LLMs. To address this, she advocates embracing variation in three key areas: model inputs, model outputs, and research practices.

Historically, NLP has evolved through significant phases, starting with symbolic processing, then statistical processing (feature engineering), and now deep learning.

Historically, NLP has evolved through significant phases

While these advancements have brought power, they’ve also eroded trust due to bias and lack of robustness and explainability.

While these advancements have brought power, they’ve also eroded trust

“Trust stems from understanding both the origin and functional capacity” [Hays. Applications. ACL 1979].

“Trust stems from understanding both the origin and functional capacity.”

Let’s focus on model evaluation, specifically D3. For example, in Multiple Choice Question Answering (MCQA), simply reversing the order of Yes-No questions can influence LLM performance, a phenomenon known as LLM's “A” bias in MCQA responses. This bias has been observed across various language models, all tending to favour the answer “A.”

“A” bias in MCQA responses

Understanding uncertainty is crucial for building trust in models by recognizing when they might be wrong when multiple perspectives could be valid, and by enhancing our understanding of the origin and functional capacity.

Embracing variation holistically for trustworthy NLP involves:

  • Input variability: including non-standard dialects.
  • Output considerations: currently, only standardized categories are accepted, often discarding differences in human labels as noise.
  • Research: focusing on human-centric perspectives and fostering research diversity.

Variation - Three Key Areas

For example, in a German dataset, it’s evident that dialects are often over-segmented by tokenizers.

Tokenizers of pre-trained models are optimized for the languages they are trained on

Regarding output, we frequently assume a single ground truth exists, but zooming out reveals a wealth of diversity and ambiguity. For instance, answering the question, “Is there a smile in this image?” shows that responses vary by country.

Is there a SMILE in this image?

Human label variation is a significant source of uncertainty, as we typically aim to maximize agreement to minimize this variation and enhance data quality. On the lower left, you can see Annotation Error; the challenge is distinguishing between plausible variations and actual errors.

Disagreement or Variation?

Lastly, I’d like to highlight an excellent panel on “Challenges and Opportunities with SEA LLMs,” which explored the unique challenges and opportunities of LLMs in Southeast Asia (SEA). The panel, chaired by Lun-Wei Ku , featured:

Prof. Sarana Nutanong shared insights about WangChanX, which involves fine-tuning existing models while developing high-quality Thai instruction data. Initially, instruction pairs were translated from English, but the focus has shifted to improving quality and quantity and addressing common specifics (finance, medical, legal, and retail). The creation process includes data collection, annotation, quality checks, and final review.


Overview of WandchanX

Prof. Ayu Purwarianti discussed Indonesia's linguistic diversity, with 700 dialects, and the five phases of research in NLP.


Indonesian & Ethnic NLP Resources (Tools & Data)

The fifth phase (2020-present) sees Indonesian researchers sharing NLP data and resources, leading to over 200 publications annually.

Indonesian NLP Researches

NusaCrowd is an Indonesian NLP Data Catalogue consolidating over 200 datasets.

NusaCrowd
Benchmark for low-resource languages

Cendol is an open-source collection of fine-tuned generative LLMs for Indonesian languages. It features both decoder-only and encoder-decoder transformer architectures with scales ranging from 300 million to 13 billion parameters.


Cendol

William Tjhi , head of applied research at AI Singapore, presented the Southeast Asian Languages in One Network (SEA-LION) project, which covers 12 official languages across 11 nations, with hundreds of dialects.

The Regional Network

SeaCrowd: A significant part of the project involves consolidating open datasets for Southeast Asian languages.

SeaCrowd

Project SealD: This initiative focuses on creating new datasets essential for the region, promoting inclusivity.

It was great connecting with Leslie Teo Akriti Vij, Andreas Tjendra, Trevor Cohn , Partha Talukdar , Pratyusha Mukherjee , Ee-Peng Lim, Erika Fille Legara, Jimson Paulo Layacan, Kasima Tharnpipitchai, Koo Ping Shung, Kunat Pipatanakul, Potsawee Manakul, Thadpong Pongthawornkamol Brandon Ong Raymond_ Ng Rengarajan Hamsawardhini Bryan Siow Leong Wai Yi Darius Liu, CFA, CAIA Kok Wai (Walter) TENG Wayne Lau Wei Qi Leong

Thank you!

* This summary captures only the key concepts from the presentations. I encourage you to explore the relevant resources further for a deeper understanding.


David de Hilster

Co-Author of NLP++ & Computational Linguist

3 个月
Gaurav Anand

Head of Customer Engineering - Strategic Enterprises

3 个月

Great write up Ofir Shalev . Feels as if i didnt miss the conference at all

回复
Dimas Lagusto

Information Security & IT-GRC Leader

3 个月

Excellent write-up! It provides key insights on latest research and applications.

回复
Kok Wai (Walter) TENG

AI / ML Software Engineer | SMU Master of IT in Business (AI)

3 个月

Appreciate the comprehensive write-up on ACL 2024, and it is great connecting with you at the conference!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了