Watch#6: LLMs 4 Science and How to Keep Your Models Focused

Pascal Biese

Daily AI highlights for 60k+ experts ???? AI/ML Engineer

发布日期: 2023年10月13日

+ 关注

In this issue:

Pruning everything that’s irrelevant
Human Feedback might not be as awesome as we think it is
LLMs discovering scientific concepts

1. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Watching: LongLLMLingua (paper/code)

What problem does it solve? Large language models (LLMs) are incredibly powerful tools, but they can be expensive and slow to use in long context scenarios. This is because LLMs need to see the entire context in order to generate accurate and relevant outputs. This can be a problem if the context is very long, such as a multi-document QA task or a code completion task.

How does it solve the problem? LongLLMLingua is a new technique for prompt compression that can help to improve the performance of LLMs in long context scenarios. It works by identifying the key information in the context and generating a compressed prompt that contains only the key information. This compressed prompt can then be used to generate accurate and relevant outputs from the LLM.

What’s next? LongLLMLingua is a promising new technique for improving the performance of LLMs in long context scenarios. It has the potential to make LLMs more accessible and affordable for a wider range of users. However, more research is needed to evaluate the performance of LongLLMLingua on a wider range of tasks and datasets.

2. Human Feedback is not Gold Standard

Watching: Human Feedback (paper/code)

领英推荐

Do LLMs Really Reason, or Just Recite?

Sudhir Gajre 11 个月前

Low Rank Adaptation: A game changer

Ravi Naarla 1 年前

Fine tuning LLMs for Memorization

Ni Zhengquan 5 个月前

What problem does it solve? Human feedback is increasingly being used to evaluate and train large language models (LLMs). However, it is not clear which properties of a generated output a single "preference" score captures. The paper you attached hypothesizes that preference scores are subjective and open to undesirable biases, and that they under-represent important aspects like factuality.

How does it solve the problem? The authors find that while preference scores have fairly good coverage, they under-represent important aspects like factuality. They also hypothesize that both preference scores and error annotation may be affected by confounders, such as the assertiveness and complexity of an output. To test their hypothesis, the authors use instruction-tuned models to generate outputs that vary along these two dimensions. They find that the assertiveness of an output skews the perceived rate of factuality errors, indicating that human annotations are not a fully reliable evaluation metric or training objective.

What’s next? Future work will have to carefully consider whether preference scores are well aligned with the desired objective. They suggest that researchers should explore alternative evaluation metrics and training objectives that better capture the desired properties of LLM outputs.

3. Large Language Models for Scientific Synthesis, Inference and Explanation

Watching: LLM4SD (paper/code)

What problem does it solve? Large language models (LLMs) have shown great promise in a variety of tasks, but they have yet to demonstrate advanced applications in scientific discovery. This paper shows how LLMs can be used to perform scientific synthesis, inference, and explanation.

How does it solve the problem? The authors propose a method for using general-purpose LLMs to make inferences from scientific datasets. They show that the LLM can augment its knowledge by synthesizing from the scientific literature. When a conventional machine learning system is augmented with this synthesized and inferred knowledge, it can outperform the current state of the art across a range of benchmark tasks for predicting molecular properties. The authors also show that the LLM can explain the machine learning system's predictions.

What’s next? The LLM4SD framework might open new avenues for AI to accelerate the pace of scientific discovery. This would be a rather exciting prospect, and I’m looking forward to see how their work is used to make new and important discoveries in science.

Papers of the Week:

LLM Watch

45,109 位关注者

要查看或添加评论，请登录

Pascal Biese的更多文章

?? LLMs Are Improving Themselves

2024年9月27日

?? LLMs Are Improving Themselves

In this issue: Self-correcting LLMs The 4 levels of RAG and beyond 9B model beating GPT-4o in RAG MLOps/GenAI World is…

2 条评论
?? A New Neural Architecture (Again)

2024年9月20日

?? A New Neural Architecture (Again)

In this issue: The return of a controversial neural network architecture NVIDIA releasing open frontier models…

4 条评论
?? What Next-Gen RAG Is About

2024年9月13日

?? What Next-Gen RAG Is About

In this issue: Dual-system RAG with photographic memory LLMs coming up with better ideas than humans Taking LLM Graph…

5 条评论
?? The Next Level of CoT Prompting

2024年9月6日

?? The Next Level of CoT Prompting

In this issue: A more strategic way of prompting Closing the open source gap for MoE models The most powerful small…

1 条评论
?? Agents for Time Series Analysis

2024年8月30日

?? Agents for Time Series Analysis

In this issue: Agents doing time series analysis Seamless migration from LLMs to SLMs Fitting your whole codebase into…

11 条评论
??? Agent-ception: When Agents Are Creating Agents

2024年8月23日

??? Agent-ception: When Agents Are Creating Agents

Foreword I've started a new format called "Executive Summaries", where I'll take the time to guide you through…

6 条评论
?? Apple's Answer to Complex LLM Evaluation

2024年8月16日

?? Apple's Answer to Complex LLM Evaluation

In this issue: We aren’t running out of data anytime soon A ToolSandbox for evaluating complex LLM applications…

2 条评论
?? The Downsides of Structured Outputs

2024年8月9日

?? The Downsides of Structured Outputs

In this issue: The downsides of structured outputs From chaining thoughts to thinking on graphs Graph RAG for domain…

18 条评论
?????? Attention Is All Graphs Need

2024年8月2日

?????? Attention Is All Graphs Need

In this issue: Attention might be all you need - even for graphs Mamba-2 is coming for multi-modal Transformers A…

9 条评论
A Historic Week for ?O?p?e?n? ?S?o?u?r?c?e? AI

2024年7月26日

A Historic Week for ?O?p?e?n? ?S?o?u?r?c?e? AI

In this issue: Next step: Fields Medal for AI? Llama 3.1 exceeding expectations Mistral being Mistral Support here 1.

7 条评论

See all articles

Watch#6: LLMs 4 Science and How to Keep Your Models Focused

Pascal Biese

Daily AI highlights for 60k+ experts ???? AI/ML Engineer

In this issue:

1. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

2. Human Feedback is not Gold Standard

领英推荐

3. Large Language Models for Scientific Synthesis, Inference and Explanation

Papers of the Week:

LLM Watch

45,109 位关注者

Pascal Biese的更多文章

社区洞察

其他会员也浏览了

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Are LLMs really poised to bring "thinking"?

LLM's Introduction

RAG vs Finetuning: why is Retrieval-Augmented Generation (RAG) becoming popular?

Paper Review: Think before you speak: Training Language Models With Pause Tokens

LIMA - Less Is More for Alignment

Catastrophic Forgetting: A side effect of Fine Tuning a Large Language Model

Handling LLMs Hallucination

GPT-4 is a Language Chameleon: The Story of Creation in Extreme Obfuscating Jargon

??Top ML Papers of the Week

In this issue:

1. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

2. Human Feedback is not Gold Standard

领英推荐

3. Large Language Models for Scientific Synthesis, Inference and Explanation

Papers of the Week:

LLM Watch

45,109 位关注者

Pascal Biese的更多文章

?? LLMs Are Improving Themselves

?? A New Neural Architecture (Again)

?? What Next-Gen RAG Is About

?? The Next Level of CoT Prompting

?? Agents for Time Series Analysis

??? Agent-ception: When Agents Are Creating Agents

?? Apple's Answer to Complex LLM Evaluation

?? The Downsides of Structured Outputs

?????? Attention Is All Graphs Need

A Historic Week for ?O?p?e?n? ?S?o?u?r?c?e? AI

社区洞察

其他会员也浏览了

Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning

Are LLMs really poised to bring "thinking"?

LLM's Introduction

RAG vs Finetuning: why is Retrieval-Augmented Generation (RAG) becoming popular?

Paper Review: Think before you speak: Training Language Models With Pause Tokens

LIMA - Less Is More for Alignment

Catastrophic Forgetting: A side effect of Fine Tuning a Large Language Model

Handling LLMs Hallucination

GPT-4 is a Language Chameleon: The Story of Creation in Extreme Obfuscating Jargon

??Top ML Papers of the Week