Watch#6: LLMs 4 Science and How to Keep Your Models Focused

Watch#6: LLMs 4 Science and How to Keep Your Models Focused

In this issue:

  1. Pruning everything that’s irrelevant
  2. Human Feedback might not be as awesome as we think it is
  3. LLMs discovering scientific concepts


1. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Watching: LongLLMLingua (paper/code)

What problem does it solve? Large language models (LLMs) are incredibly powerful tools, but they can be expensive and slow to use in long context scenarios. This is because LLMs need to see the entire context in order to generate accurate and relevant outputs. This can be a problem if the context is very long, such as a multi-document QA task or a code completion task.

How does it solve the problem? LongLLMLingua is a new technique for prompt compression that can help to improve the performance of LLMs in long context scenarios. It works by identifying the key information in the context and generating a compressed prompt that contains only the key information. This compressed prompt can then be used to generate accurate and relevant outputs from the LLM.

What’s next? LongLLMLingua is a promising new technique for improving the performance of LLMs in long context scenarios. It has the potential to make LLMs more accessible and affordable for a wider range of users. However, more research is needed to evaluate the performance of LongLLMLingua on a wider range of tasks and datasets.


2. Human Feedback is not Gold Standard

Watching: Human Feedback (paper/code)

What problem does it solve? Human feedback is increasingly being used to evaluate and train large language models (LLMs). However, it is not clear which properties of a generated output a single "preference" score captures. The paper you attached hypothesizes that preference scores are subjective and open to undesirable biases, and that they under-represent important aspects like factuality.

How does it solve the problem? The authors find that while preference scores have fairly good coverage, they under-represent important aspects like factuality. They also hypothesize that both preference scores and error annotation may be affected by confounders, such as the assertiveness and complexity of an output. To test their hypothesis, the authors use instruction-tuned models to generate outputs that vary along these two dimensions. They find that the assertiveness of an output skews the perceived rate of factuality errors, indicating that human annotations are not a fully reliable evaluation metric or training objective.

What’s next? Future work will have to carefully consider whether preference scores are well aligned with the desired objective. They suggest that researchers should explore alternative evaluation metrics and training objectives that better capture the desired properties of LLM outputs.


3. Large Language Models for Scientific Synthesis, Inference and Explanation

Watching: LLM4SD (paper/code)

What problem does it solve? Large language models (LLMs) have shown great promise in a variety of tasks, but they have yet to demonstrate advanced applications in scientific discovery. This paper shows how LLMs can be used to perform scientific synthesis, inference, and explanation.

How does it solve the problem? The authors propose a method for using general-purpose LLMs to make inferences from scientific datasets. They show that the LLM can augment its knowledge by synthesizing from the scientific literature. When a conventional machine learning system is augmented with this synthesized and inferred knowledge, it can outperform the current state of the art across a range of benchmark tasks for predicting molecular properties. The authors also show that the LLM can explain the machine learning system's predictions.

What’s next? The LLM4SD framework might open new avenues for AI to accelerate the pace of scientific discovery. This would be a rather exciting prospect, and I’m looking forward to see how their work is used to make new and important discoveries in science.


Papers of the Week:

要查看或添加评论,请登录

Pascal Biese的更多文章

  • ?? LLMs Are Improving Themselves

    ?? LLMs Are Improving Themselves

    In this issue: Self-correcting LLMs The 4 levels of RAG and beyond 9B model beating GPT-4o in RAG MLOps/GenAI World is…

    2 条评论
  • ?? A New Neural Architecture (Again)

    ?? A New Neural Architecture (Again)

    In this issue: The return of a controversial neural network architecture NVIDIA releasing open frontier models…

    4 条评论
  • ?? What Next-Gen RAG Is About

    ?? What Next-Gen RAG Is About

    In this issue: Dual-system RAG with photographic memory LLMs coming up with better ideas than humans Taking LLM Graph…

    5 条评论
  • ?? The Next Level of CoT Prompting

    ?? The Next Level of CoT Prompting

    In this issue: A more strategic way of prompting Closing the open source gap for MoE models The most powerful small…

    1 条评论
  • ?? Agents for Time Series Analysis

    ?? Agents for Time Series Analysis

    In this issue: Agents doing time series analysis Seamless migration from LLMs to SLMs Fitting your whole codebase into…

    11 条评论
  • ??? Agent-ception: When Agents Are Creating Agents

    ??? Agent-ception: When Agents Are Creating Agents

    Foreword I've started a new format called "Executive Summaries", where I'll take the time to guide you through…

    6 条评论
  • ?? Apple's Answer to Complex LLM Evaluation

    ?? Apple's Answer to Complex LLM Evaluation

    In this issue: We aren’t running out of data anytime soon A ToolSandbox for evaluating complex LLM applications…

    2 条评论
  • ?? The Downsides of Structured Outputs

    ?? The Downsides of Structured Outputs

    In this issue: The downsides of structured outputs From chaining thoughts to thinking on graphs Graph RAG for domain…

    18 条评论
  • ?????? Attention Is All Graphs Need

    ?????? Attention Is All Graphs Need

    In this issue: Attention might be all you need - even for graphs Mamba-2 is coming for multi-modal Transformers A…

    9 条评论
  • A Historic Week for ?O?p?e?n? ?S?o?u?r?c?e? AI

    A Historic Week for ?O?p?e?n? ?S?o?u?r?c?e? AI

    In this issue: Next step: Fields Medal for AI? Llama 3.1 exceeding expectations Mistral being Mistral Support here 1.

    7 条评论

社区洞察

其他会员也浏览了