登录查看更多内容

Insights from ACL 2024 Bangkok: Advancing AI, LLMs and NLP

Ofir Shalev

Group Chief Data Officer (CDO) | Ex CTO/CIO

发布日期: 2024年8月21日

When developing AI products like?Dira,?much of our time is dedicated to collaborating with internal and external clients, gathering product requirements, collecting data and training models, and integrating them with our back-end services and front-end interfaces.

Attending the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) in Bangkok provided a valuable opportunity to learn from experts in the field and explore new methodologies and applications.

The conference featured numerous engaging workshops, tutorials, and presentations on various topics.

Overview

Prof. Subbarao Kambhampati from 美国亚利桑那州立大学 discussed the question, “Can LLMs Reason and Plan?” He concluded that while LLMs struggle with planning, they can assist in planning when used in LLM-Modulo frameworks alongside external verifiers and solvers. For example, projects like AlphaProof and AlphaGeometry leverage fine-tuned LLMs to enhance the accuracy of predictions.

https://x.com/rao2z/status/1733311474716885423

Prof. Barbara Plank from LMU Munich delivered another notable keynote titled “Are LLMs Narrowing Our Horizon? Let’s Embrace Variation in NLP!” Prof. Plank addressed current challenges in NLP, such as biases, robustness, and explainability, and advocated for embracing variation in inputs, outputs, and research to rebuild trust in LLMs. She pointed out that despite the power gained through advances like deep learning, trust has diminished due to issues such as bias. Prof. Plank suggested that understanding uncertainty and embracing variation—especially in model inputs and outputs—is key to developing more trustworthy NLP systems.

Trust arises from knowledge of origin as well as from knowledge of functional capacity

Another highlight was the “Challenges and Opportunities with SEA LLMs” panel. Chaired by Lun-Wei Ku , it featured insights from experts like Prof. Sarana Nutanong from VISTEC, Prof. Ayu Purwarianti from ITB Indonesia, and William Tjhi from AI Singapore . They discussed the development of LLMs in Southeast Asia, emphasizing the importance of quality data collection and annotation for regional languages.

Detailed Insights

Prof. Subbarao Kambhampati from Arizona State University discussed the topic “Can LLMs Reason and Plan?” The takeaway: LLMs struggle with planning, but...

To illustrate, consider this scenario: “If block C is on top of block A and block B is separately on the table, how can you create a stack with block A on top of block B and block B on top of block C without moving block C?”

Even though this is impossible, ChatGPT 4o mistakenly attempts to comply, moving block C in the first step.

This conclusion is backed by in-depth research, such as the study titled “ON THE PLANNING ABILITIES OF LARGE LANGUAGE MODELS,” supported by recent statistics.

The silver lining is that while LLMs aren’t proficient at planning, they can assist with planning when integrated into LLM-Modulo frameworks and used alongside external verifiers and solvers.

For instance, AlphaProof and AlphaGeometry utilize fine-tuned LLMs to enhance the accuracy of their predictions, as detailed here.

Another noteworthy presentation was given by Prof. Barbara Plank , Professor of AI and Computational Linguistics at LMU Munich, titled “Are LLMs Narrowing Our Horizon? Let’s Embrace Variation in NLP!”

Prof. Plank highlighted current challenges that have contributed to a decline in trust in LLMs. To address this, she advocates embracing variation in three key areas: model inputs, model outputs, and research practices.

Historically, NLP has evolved through significant phases, starting with symbolic processing, then statistical processing (feature engineering), and now deep learning.

While these advancements have brought power, they’ve also eroded trust due to bias and lack of robustness and explainability.

“Trust stems from understanding both the origin and functional capacity” [Hays. Applications. ACL 1979].

Let’s focus on model evaluation, specifically D3. For example, in Multiple Choice Question Answering (MCQA), simply reversing the order of Yes-No questions can influence LLM performance, a phenomenon known as LLM's “A” bias in MCQA responses. This bias has been observed across various language models, all tending to favour the answer “A.”

Understanding uncertainty is crucial for building trust in models by recognizing when they might be wrong when multiple perspectives could be valid, and by enhancing our understanding of the origin and functional capacity.

Embracing variation holistically for trustworthy NLP involves:

Input variability: including non-standard dialects.
Output considerations: currently, only standardized categories are accepted, often discarding differences in human labels as noise.
Research: focusing on human-centric perspectives and fostering research diversity.

领英推荐

What's New in NLP? #2

Cohere 2 年前

What’s New in NLP? #5 Summarize Beta, Top NLP Papers…

Cohere 2 年前

What's New in NLP? #3

Cohere 2 年前

For example, in a German dataset, it’s evident that dialects are often over-segmented by tokenizers.

Tokenizers of pre-trained models are optimized for the languages they are trained on

Regarding output, we frequently assume a single ground truth exists, but zooming out reveals a wealth of diversity and ambiguity. For instance, answering the question, “Is there a smile in this image?” shows that responses vary by country.

Human label variation is a significant source of uncertainty, as we typically aim to maximize agreement to minimize this variation and enhance data quality. On the lower left, you can see Annotation Error; the challenge is distinguishing between plausible variations and actual errors.

Lastly, I’d like to highlight an excellent panel on “Challenges and Opportunities with SEA LLMs,” which explored the unique challenges and opportunities of LLMs in Southeast Asia (SEA). The panel, chaired by Lun-Wei Ku , featured:

Prof. Sarana Nutanong , VISTEC
Prof. Ayu Purwarianti , Institut Teknologi Bandung (ITB) , Indonesia
William Tjhi , AI Singapore

Prof. Sarana Nutanong shared insights about WangChanX, which involves fine-tuning existing models while developing high-quality Thai instruction data. Initially, instruction pairs were translated from English, but the focus has shifted to improving quality and quantity and addressing common specifics (finance, medical, legal, and retail). The creation process includes data collection, annotation, quality checks, and final review.

Prof. Ayu Purwarianti discussed Indonesia's linguistic diversity, with 700 dialects, and the five phases of research in NLP.

Indonesian & Ethnic NLP Resources (Tools & Data)

The fifth phase (2020-present) sees Indonesian researchers sharing NLP data and resources, leading to over 200 publications annually.

NusaCrowd is an Indonesian NLP Data Catalogue consolidating over 200 datasets.

Cendol is an open-source collection of fine-tuned generative LLMs for Indonesian languages. It features both decoder-only and encoder-decoder transformer architectures with scales ranging from 300 million to 13 billion parameters.

William Tjhi , head of applied research at AI Singapore, presented the Southeast Asian Languages in One Network (SEA-LION) project, which covers 12 official languages across 11 nations, with hundreds of dialects.

SeaCrowd: A significant part of the project involves consolidating open datasets for Southeast Asian languages.

Project SealD: This initiative focuses on creating new datasets essential for the region, promoting inclusivity.

It was great connecting with Leslie Teo Akriti Vij, Andreas Tjendra, Trevor Cohn , Partha Talukdar , Pratyusha Mukherjee , Ee-Peng Lim, Erika Fille Legara, Jimson Paulo Layacan, Kasima Tharnpipitchai, Koo Ping Shung, Kunat Pipatanakul, Potsawee Manakul, Thadpong Pongthawornkamol Brandon Ong Raymond_ Ng Rengarajan Hamsawardhini Bryan Siow Leong Wai Yi Darius Liu, CFA, CAIA Kok Wai (Walter) TENG Wayne Lau Wei Qi Leong

Thank you!

* This summary captures only the key concepts from the presentations. I encourage you to explore the relevant resources further for a deeper understanding.

David de Hilster

Co-Author of NLP++ & Adjunct Professor at Northeastern University Miami

6 个月

Here are my thoughts from the conference: https://nluglob.org/acl-2024-in-bangkok-thailand-revelations-of-old-and-new/

2 次回应

Gaurav Anand

Head of Customer Engineering - Strategic Enterprises

6 个月

Great write up Ofir Shalev . Feels as if i didnt miss the conference at all

Dimas Lagusto

Information Security & IT-GRC Leader

6 个月

Excellent write-up! It provides key insights on latest research and applications.

Kok Wai (Walter) TENG

AI / ML Software Engineer | SMU Master of IT in Business (AI)

6 个月

Appreciate the comprehensive write-up on ACL 2024, and it is great connecting with you at the conference!

查看更多评论

要查看或添加评论，请登录

Ofir Shalev的更多文章

The Secret Recipe Behind GO-FOOD's Recommendations (PyData Meetup)

2018年12月3日

The Secret Recipe Behind GO-FOOD's Recommendations (PyData Meetup)

The December PyData Meetup started with Luis Smith, Data Scientist at GO-JEK, sharing the Secret Recipe Behind…

4 条评论
Deepfake Videos Are Getting Real and That’s a Problem

2018年10月16日

Deepfake Videos Are Getting Real and That’s a Problem

Seeing isn’t believing anymore. Deep-learning computer applications can now generate fake video and audio recordings…

3 条评论
Highlights of the 2018 Singapore Symposium on Natural Language Processing (SSNLP)

2018年7月17日

Highlights of the 2018 Singapore Symposium on Natural Language Processing (SSNLP)

What a great symposium! Thank you Dr. Linlin Li, Prof.
MICE is Nice, but why should you care?

2018年6月12日

MICE is Nice, but why should you care?

Multiple Imputation by Chained Equations (MICE) As every data scientist will witness, it is rarely that your data is…
Is Deep Learning Overhyped?

2017年4月5日

Is Deep Learning Overhyped?

There’s been a definite buzz lately about deep learning. On March 2016 Google’s artificially intelligent Go-playing…

2 条评论
How this company is using data to shake up the billion-dollar employee benefits industry

2017年2月21日

How this company is using data to shake up the billion-dollar employee benefits industry

What if you got to choose exactly what you do with your employee benefits? The money has already been set aside, but…

5 条评论
Changing the Game with Data and Insights - Data Science Singapore

2016年4月27日

Changing the Game with Data and Insights - Data Science Singapore

Another great Data Singapore (DSSG) event! Hong Cao from McLaren Applied Technologies shared his insights on…
Data Scientists, With Great Power Comes Great Responsibility

2015年10月27日

Data Scientists, With Great Power Comes Great Responsibility

It is a good time to be a data scientist. In 2012 the Harvard Business Review hailed the role of data scientist "The…

6 条评论
Bioinformatic Is Cool, I Mean Really Cool

2015年10月14日

Bioinformatic Is Cool, I Mean Really Cool

I've been fascinated by genomic research for years. While successfully implementing a fairly large and diverse set of…

15 条评论
The Evolving Role of the Chief Data Officer

2015年7月14日

The Evolving Role of the Chief Data Officer

In recent years, there has been a significant rise in the appointments of Chief Data Officers (CDOs). Although this…

16 条评论

See all articles

Insights from ACL 2024 Bangkok: Advancing AI, LLMs and NLP

Ofir Shalev

Group Chief Data Officer (CDO) | Ex CTO/CIO

Overview

Detailed Insights

领英推荐

Ofir Shalev的更多文章

社区洞察

其他会员也浏览了

What's New in NLP? #9 Cohere + DeepLearning.AI collaboration, upcoming webinar on generative AI and workforce productivity, and more!

What's New in NLP? #1

What's New in NLP? #8 Coral, McKinsey, Amazon Bedrock, and more!

What's New in NLP? #6 Unveiling Cohere’s New Brand & Website, and More!

Complimentary whitepaper: 74% of respondents prioritize responsible, explainable AI when purchasing a solution

92% of organizations are adopting NLP to manage and analyze unstructured data more effectively.

12 Popular NLP Projects, an Intro to Transfer Learning, and How to Use Large AI Models at Low Costs

Topic Modelling: A Technique for Generating New Content

The best models of the first NLP hackathon in Spanish

Our NLP engineer Manuel Romero reaches 300 models on Hugging Face!

Overview

Detailed Insights

领英推荐

Ofir Shalev的更多文章

The Secret Recipe Behind GO-FOOD's Recommendations (PyData Meetup)

Deepfake Videos Are Getting Real and That’s a Problem

Highlights of the 2018 Singapore Symposium on Natural Language Processing (SSNLP)

MICE is Nice, but why should you care?

Is Deep Learning Overhyped?

How this company is using data to shake up the billion-dollar employee benefits industry

Changing the Game with Data and Insights - Data Science Singapore

Data Scientists, With Great Power Comes Great Responsibility

Bioinformatic Is Cool, I Mean Really Cool

The Evolving Role of the Chief Data Officer

社区洞察

其他会员也浏览了

What's New in NLP? #9 Cohere + DeepLearning.AI collaboration, upcoming webinar on generative AI and workforce productivity, and more!

What's New in NLP? #1

What's New in NLP? #8 Coral, McKinsey, Amazon Bedrock, and more!

What's New in NLP? #6 Unveiling Cohere’s New Brand & Website, and More!

Complimentary whitepaper: 74% of respondents prioritize responsible, explainable AI when purchasing a solution

92% of organizations are adopting NLP to manage and analyze unstructured data more effectively.

12 Popular NLP Projects, an Intro to Transfer Learning, and How to Use Large AI Models at Low Costs

Topic Modelling: A Technique for Generating New Content

The best models of the first NLP hackathon in Spanish

Our NLP engineer Manuel Romero reaches 300 models on Hugging Face!