Watch#6: LLMs 4 Science and How to Keep Your Models Focused
In this issue:
1. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
What problem does it solve? Large language models (LLMs) are incredibly powerful tools, but they can be expensive and slow to use in long context scenarios. This is because LLMs need to see the entire context in order to generate accurate and relevant outputs. This can be a problem if the context is very long, such as a multi-document QA task or a code completion task.
How does it solve the problem? LongLLMLingua is a new technique for prompt compression that can help to improve the performance of LLMs in long context scenarios. It works by identifying the key information in the context and generating a compressed prompt that contains only the key information. This compressed prompt can then be used to generate accurate and relevant outputs from the LLM.
What’s next? LongLLMLingua is a promising new technique for improving the performance of LLMs in long context scenarios. It has the potential to make LLMs more accessible and affordable for a wider range of users. However, more research is needed to evaluate the performance of LongLLMLingua on a wider range of tasks and datasets.
2. Human Feedback is not Gold Standard
领英推荐
What problem does it solve? Human feedback is increasingly being used to evaluate and train large language models (LLMs). However, it is not clear which properties of a generated output a single "preference" score captures. The paper you attached hypothesizes that preference scores are subjective and open to undesirable biases, and that they under-represent important aspects like factuality.
How does it solve the problem? The authors find that while preference scores have fairly good coverage, they under-represent important aspects like factuality. They also hypothesize that both preference scores and error annotation may be affected by confounders, such as the assertiveness and complexity of an output. To test their hypothesis, the authors use instruction-tuned models to generate outputs that vary along these two dimensions. They find that the assertiveness of an output skews the perceived rate of factuality errors, indicating that human annotations are not a fully reliable evaluation metric or training objective.
What’s next? Future work will have to carefully consider whether preference scores are well aligned with the desired objective. They suggest that researchers should explore alternative evaluation metrics and training objectives that better capture the desired properties of LLM outputs.
3. Large Language Models for Scientific Synthesis, Inference and Explanation
What problem does it solve? Large language models (LLMs) have shown great promise in a variety of tasks, but they have yet to demonstrate advanced applications in scientific discovery. This paper shows how LLMs can be used to perform scientific synthesis, inference, and explanation.
How does it solve the problem? The authors propose a method for using general-purpose LLMs to make inferences from scientific datasets. They show that the LLM can augment its knowledge by synthesizing from the scientific literature. When a conventional machine learning system is augmented with this synthesized and inferred knowledge, it can outperform the current state of the art across a range of benchmark tasks for predicting molecular properties. The authors also show that the LLM can explain the machine learning system's predictions.
What’s next? The LLM4SD framework might open new avenues for AI to accelerate the pace of scientific discovery. This would be a rather exciting prospect, and I’m looking forward to see how their work is used to make new and important discoveries in science.
Papers of the Week: